2019独角兽企业重金招聘Python工程师标准>>>

Operating Systems Lecture Notes
Lecture 5
Implementing Synchronization Operations

Martin C. Rinard

How do we implement synchronization operations like locks? Can build synchronization operations out of atomic reads and writes. There is a lot of literature on how to do this, one algorithm is called the bakery algorithm. But, this is slow and cumbersome to use. So, most machines have hardware support for synchronization - they provide synchronization instructions.
On a uniprocessor, the only thing that will make multiple instruction sequences not atomic is interrupts. So, if want to do a critical section, turn off interrupts before the critical section and turn on interrupts after the critical section. Guaranteed atomicity. It is also fairly efficient. Early versions of Unix did this.
Why not just use turning off interrupts? Two main disadvantages: can't use in a multiprocessor, and can't use directly from user program for synchronization.
Test-And-Set. The test and set instruction atomically checks if a memory location is zero, and if so, sets the memory location to 1. If the memory location is 1, it does nothing. It returns the old value of the memory location. You can use test and set to implement locks as follows:
- The lock state is implemented by a memory location. The location is 0 if the lock is unlocked and 1 if the lock is locked.
- The lock operation is implemented as:
```
while (test-and-set(l) == 1);
```
- The unlock operation is implemented as: *l = 0;
The problem with this implementation is busy-waiting. What if one thread already has the lock, and another thread wants to acquire the lock? The acquiring thread will spin until the thread that already has the lock unlocks it.
What if the threads are running on a uniprocessor? How long will the acquiring thread spin? Until it expires its quantum and thread that will unlock the lock runs. So on a uniprocessor, if can't get the thread the first time, should just suspend. So, lock acquisition looks like this:
```
while (test-and-set(l) == 1) { currentThread->Yield();
}
```
Can make it even better by having a queue lock that queues up the waiting threads and gives the lock to the first thread in the queue. So, threads never try to acquire lock more than once.
On a multiprocessor, it is less clear. Process that will unlock the lock may be running on another processor. Maybe should spin just a little while, in hopes that other process will release lock. To evaluate spinning and suspending strategies, need to come up with a cost for each suspension algorithm. The cost is the amount of CPU time the algorithm uses to acquire a lock.
There are three components of the cost: spinning, suspending and resuming. What is the cost of spinning? Waste the CPU for the spin time. What is cost of suspending and resuming? Amount of CPU time it takes to suspend the thread and restart it when the thread acquires the lock.
Each lock acquisition algorithm spins for a while, then suspends if it didn't get the lock. The optimal algorithm is as follows:
- If the lock will be free in less than the suspend and resume time, spin until acquire the lock.
- If the lock will be free in more than the suspend and resume time, suspend immediately.
Obviously, cannot implement this algorithm - it requires knowledge of the future, which we do not in general have.
How do we evaluate practical algorithms - algorithms that spin for a while, then suspend. Well, we compare them with the optimal algorithm in the worst case for the practical algorithm. What is the worst case for any practical algorithm relative to the optimal algorithm? When the lock become free just after the practical algorithm stops spinning.
What is worst-case cost of algorithm that spins for the suspend and resume time, then suspends? (Will call this the SR algorithm). Two times the suspend and resume time. The worst case is when the lock is unlocked just after the thread starts the suspend. The optimal algorithm just spins until the lock is unlocked, taking the suspend and resume time to acquire the lock. The SR algorithm costs twice the suspend and resume time -it first spins for the suspend and resume time, then suspends, then gets the lock, then resumes.
What about other algorithms that spin for a different fixed amount of time then block? Are all worse than the SR algorithm.
- If spin for less than suspend and resume time then suspend (call this the LT-SR algorithm), worst case is when lock becomes free just after start the suspend. In this case the the algorithm will cost spinning time plus suspend and resume time. The SR algorithm will just cost the spinning time.
- If spin for greater than suspend and resume time then suspend (call this the GR-SR algorithm), worst case is again when lock becomes free just after start the suspend. In this case the SR algorithm will also suspend and resume, but it will spin for less time than the GT-SR algorithm
Of course, in practice locks may not exhibit worst case behavior, so best algorithm depends on locking and unlocking patterns actually observed.

Here is the SR algorithm. Again, can be improved with use of queueing locks.

notDone = test-and-set(l);
if (!notDone) return;
start = readClock();
while (notDone) { stop = readClock();if (stop - start >= suspendAndResumeTime) { currentThread->Yield();start = readClock();}notDone = test-and-set(l);
}

There is an orthogonal issue. test-and-set instruction typically consumes bus resources every time. But a load instruction caches the data. Subsequent loads come out of cache and never hit the bus. So, can do something like this for inital algorithm:
```
while (1) { if !test-and-set(l) break;while (*l == 1);
}
```
Are other instructions that can be used to implement spin locks - swap instruction, for example.
On modern RISC machines, test-and-set and swap may cause implementation headaches. Would rather do something that fits into load/store nature of architecture. So, have a non-blocking abstraction: Load Linked(LL)/Store Conditional(SC).
Semantics of LL: Load memory location into register and mark it as loaded by this processor. A memory location can be marked as loaded by more than one processor.
Semantics of SC: if the memory location is marked as loaded by this processor, store the new value and remove all marks from the memory location. Otherwise, don't perform the store. Return whether or not the store succeeded.
Here is how to use LL/SC to implement the lock operation:
```
while (1) { LL  r1, lockif (r1 == 0) { LI r2, 1if (SC r2, lock) break;}}
```
Unlock operation is the same as before.
Can also use LL/SC to implement some operations (like increment) directly. People have built up a whole bunch of theory dealing with the difference in power between stuff like LL/SC and test-and-set.
```
while (1) { LL   r1, lockADDI r1, 1, r1if (SC r2, lock) break;}
```
Note that the increment operation is non-blocking. If two threads start to perform the increment at the same time, neither will block - both will complete the add and only one will successfully perform the SC. The other will retry. So, it eliminates problems with locking like: one thread acquires locks and dies, or one thread acquires locks and is suspended for a long time, preventing other threads that need to acquire the lock from proceeding.

Permission is granted to copy and distribute this material for educational purposes only, provided that the following credit line is included: "Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard." Permission is granted to alter and distribute this material provided that the following credit line is included: "Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard."

转载于:https://my.oschina.net/fuckmylife0/blog/841334

Implementing Synchronization Operations相关推荐

【论文阅读】SyncPerf: Categorizing, Detecting, and Diagnosing Synchronization Performance Bugs
本次是初步写论文记录,以翻译为主,后续会更改为只讲述核心思想. 欢迎访问 Github :https://github.com/MercuryLc/paper_reading SyncPerf: Ca ...
ARM Context synchronization event和Instruction Synchronization Barrier
在Arm architecture里,经常提到Context synchronization event(CSE)和Explicit synchronization,Context synchroni ...
IMAP协议RFC3501中文文档
因特网邮件访问协议,版本4rev1(IMAP4rev1)允许一个客户端访问和操作在一个服务器上的电子邮件.IMAP4rev1允许,以一种功能上等效于本地文件夹的方式,操作邮箱(远程邮件文件夹).IM ...
IMAP协议RFC3501中文文档 .
IMAP协议RFC3501中文文档 . 分类: 各类协议标准文档 2011-05-18 09:48 1238人阅读评论(0) 收藏举报因特网邮件访问协议,版本4rev1(IMAP4rev1)允许 ...
淘宝用了mysql,您呢？
淘宝内部分享:怎么跳出MySQL的10个大坑编者按:淘宝自从2010开始规模使用MySQL,替换了之前商品.交易.用户等原基于IOE方案的核心数据库,目前已部署数千台规模.同时和Oracle, ...
CodeForces - 1110E-Magic Stones（差分+思维）
Grigory has nn magic stones, conveniently numbered from 11 to nn. The charge of the ii-th stone is e ...
#翻译NO.4# --- Spring Integration Framework
为什么80%的码农都做不了架构师?>>> Part III. Core Messaging This section covers all aspects of the cor ...
Java 中的 Reference
1.强引用(StrongReference) 强引用不会被GC回收,并且在java.lang.ref里也没有实际的对应类型.举个例子来说: Object obj = new Object(); 这里的 ...
mysql 5.6 binlog组提交
mysql 5.6 binlog组提交实现原理 http://blog.itpub.net/15480802/viewspace-1411356 Redo组提交 Redo提交流程大致如下 lock l ...

Implementing Synchronization Operations

Operating Systems Lecture Notes
Lecture 5
Implementing Synchronization Operations

Implementing Synchronization Operations相关推荐

最新文章

热门文章

Implementing Synchronization Operations

Operating Systems Lecture Notes Lecture 5 Implementing Synchronization Operations

Implementing Synchronization Operations相关推荐

最新文章

热门文章

Operating Systems Lecture Notes
Lecture 5
Implementing Synchronization Operations