barrier(wmb,mb,rmb)和cache coherence

article/2025/9/23 19:56:43

http://www.linuxforum.net/forum/gshowflat.php?Cat=&Board=linuxK&Number=428239&page=5&view=collapsed&sb=5&o=all&fpart=

注: 这里的barrier 指的是wmb, rmb, mb.

一 直找不到合适的资料说明barrier和 Cache coherence 之间的关系. 在<<ldd>> <<ULK>>等书中说明了barrier的基本用法. Ldd 着重于在和外设打交道的时候barrier所起的作用. 还有一个例子是使用barrier实现无锁的链表操作.

迷惑于这种使用barrier实现的无锁操作. 另外的例子就是big read lock(brlock.c brlock.h). 找不到一些理由或者条件, 指出必须使用barrier的情景. 还有就是softirq.c 中的void init_bh(int nr, void (*routine)(void)), 也使用了barrier.

应该是这样: barrier强制cpu实现strong order而cache conherence 关注memory在多个cpu的cache中的copy的一致性问题. 问题是,

cache conhernece需要barrier的参与吗?
引起cpu reorder内存读写的技术有那些? Write buffer算一个, 但是数据在writer buffer的时候cache是否是一致的?(应该是一致的, 如果是这样,说明barrier和cache conherence根本就是两码事, cache conherence 对软件完全透明但是barrier需要软件的参与)

我想在下面的情景下需要考虑使用barrier: 两个cpu都会操作同一个数据(写的情形需要互斥), 或者读写两个数据, 但是这两个数据有某种关系, 比如根据一个数据的值决定另一数据的操作.
这个描述很不让人满意. 也纯粹是推理(当然也有道理, per cpu的data永远不用barrier, reoreder 的问题不影响单个cpu.(fix me)).

另外, 涉及到使用Barrier 的时候, 有的书籍说: 让改变立即对所有的cpu可见. 这种说法也许不妥, 我们应该怎样理解这个问题?

 


这里为什么有mb()

我的理解和你差不多:
1.cache coherence不需要barrier参与,完全由硬件协议保证cache coherence.
2. 引起cpu reorder的技术还有: load forwarding和load scheduling.
3. 只有涉及到至少两个内存地址和两个对内存访问的功能单元(CPU或外设)时才有memory ordering问题。
4. “让改变立即对所有的cpu可见”不妥, 应该是让两个内存操作的被其它CPU可见的顺序符合某种要求(release, acquire or fence)。

reoreder 的问题也会影响单cpu。
上学期学的一门《高级计算机系统结构》课程就讲过一个经典例子。
PowerPC机的store指令紧跟着一个load指令就会发生颠倒顺序的执行。对于下述程序,如果不用memory barrier,就会发生错误:

1 while ( TDRE == 0 );
2 TDR = char1;
3 asm("eieio"); // memory barrier
4 while ( TDRE == 0 );
5 TDR = char2;

程序说明:假设上述的TDRE是某外设的状态寄存器,为0表示外设忙,为1表示外设可以接收一个字符并处理,这时用户可向外设的TDR寄存器写入一个待处理的字符,写入之后TDRE变为0,外设处理字符需要一段时间,在这段时间过去后TDRE又从0变为1。

假 设将上述程序的第三行去掉,那么程序在执行第一行时会等到TDRE为1时继续向下执行,然而后面有一条store指令(TDR = char1;)之后紧跟一条load指令( while ( TDRE == 0 ); ),这时第4句中的load指令会先执行,然后再执行第二行的store指令,这样第四行load出来的寄存器值肯定为1,这样就会立刻执行第五行,结果 造成外设在忙的状态下接收到第二个字符,这样肯定会出错。
所以必须叫加入第三行,以确保store指令在load指令前执行。

reoreder所引起的这个外设(非smp cpu)的问题是很好理解的, 关键是对smp环境的程序设计产生的影响不那么直观.

如果我们的想法正确. 就应该看看init_bh, ksoftirqd的问题了. 但是好像不太直观啊.

我也急迫地想了解这方面的东西。
另外ldd的中断一章也看不懂。


This code, though simple, represents the typical job of an interrupt handler. It, in
turn, calls short_incr_bp, which is defined as follows:
static inline void short_incr_bp(volatile unsigned long *index,
int delta)
{
unsigned long new = *index + delta;
barrier (); /* Don’t optimize these two together */
这里为什么用barrier?


*index = (new >= (short_buffer + PAGE_SIZE)) ? short_buffer : new;
}
This function has been carefully written to wrap a pointer into the circular buffer
without ever exposing an incorrect value. By assigning only the final value and
placing a barrier to keep the compiler from optimizing things, it is possible to
manipulate the circular buffer pointers safely without locks.

高手指点。
在linux 中所有的原子操作都是带有mb的, 并且带有barrier(防止gcc对指令进行reorder ). 对应x86 , 就是像 test_and_setbit之类的操作都是以 volatile /lock /:memory的方式实现的.

原因也是明显的:

spin_lock(lock);
some read/write
………..

如果没有mb, 有可能 some read/write 会跑到 spin_lock 之前去执行, 这当然是不容许的.

这是一篇说明实现 lock free searching的文章, 对理解barrier很有裨益, 研究它远比翻译它有价值




Data Dependencies and wmb()

Version 1.0







Goal



Support lock-free algorithms without inefficient and ugly read-side code.

Obstacle Some CPUs do not support synchronous invalidation in hardware.





Example Code

Insertion into an unordered lock-free circular singly linked list,

while allowing concurrent searches.





Data Structures


The data structures used in all these examples are

a list element, a header element, and a lock.





struct el {

struct el *next;

long key;

long data;

};

struct el head;

spinlock_t mutex;





Search and Insert Using Global Locking



The familiar globally locked implementations of search() and insert() are as follows:



struct el *insert(long key, long data)

{

struct el *p;

p = kmalloc(sizeof(*p), GPF_ATOMIC);

spin_lock(&mutex);

p->next = head.next;

p->key = key;

p->data = data;

head.next = p;

spin_unlock(&mutex);

}



struct el *search(long key)

{

struct el *p;

p = head.next;

while (p != &head) {

if (p->key == key) {

return (p);

}

p = p->next;

}

return (NULL);

}



/* Example use. */



spin_lock(&mutex);

p = search(key);

if (p != NULL) {

/* do stuff with p */

}

spin_unlock(&mutex);



These implementations are quite straightforward, but are subject to locking bottlenecks.



Search and Insert Using wmb() and rmb()



The existing wmb() and rmb() primitives can be used to do lock-free insertion. The

searching task will either see the new element or not, depending on the exact timing,

just like the locked case. In any case, we are guaranteed not to see an invalid pointer,

regardless of timing, again, just like the locked case. The problem is that wmb() is

guaranteed to enforce ordering only on the writing CPU --

the reading CPU must use rmb() to keep the ordering.





struct el *insert(long key, long data)

{

struct el *p;

p = kmalloc(sizeof(*p), GPF_ATOMIC);

spin_lock(&mutex);

p->next = head.next;

p->key = key;

p->data = data;

wmb();

head.next = p;

spin_unlock(&mutex);

}



struct el *search(long key)

{

struct el *p;

p = head.next;

while (p != &head) {

rmb();

if (p->key == key) {

return (p);

}

p = p->next;

};

return (NULL);

}



(Note: see read-copy update for information on how to delete elements from this list

while still permitting lock-free searches.)





The rmb()s in search() cause unnecessary performance degradation on CPUs (such as the

i386, IA64, PPC, and SPARC) where data dependencies result in an implied rmb(). In

addition, code that traverses a chain of pointers would have to be broken up in order to

insert the needed rmb()s. For example:



d = p->next->data;



would have to be rewritten as:

q = p->next;

rmb();

d = q->data;



One could imagine much uglier code expansion where there are more dereferences in a

single expression. The inefficiencies and code bloat could be avoided if there were a

primitive like wmb() that allowed read-side data dependencies to act as implicit rmb()

invocations.





Why do You Need the rmb()?



Many CPUs have single instructions that cause other CPUs to see preceding stores before

subsequent stores, without the reading CPUs needing an explicit rmb() if a data dependency

forces the ordering.



However, some CPUs have no such instruction, the Alpha being a case in point. On these

CPUs, a wmb() only guarantees that the invalidate requests corresponding to the writes

will be emitted in order. The wmb() does not guarantee that the reading CPU will process

these invalidates in order.



For example, consider a CPU with a partitioned cache, as shown in the following diagram:







Here, even-numbered cachelines are maintained in cache bank 0, and odd-numbered cache

lines are maintained in cache bank 1. Suppose that head was maintained in cache bank 0,

and that a newly allocated element was maintained in cache bank 1. The insert() code's

wmb() would guarantee that the invalidates corresponding to the writes to the next, key,

and data fields would appear on the bus before the write to head->next.

But suppose that the reading CPU's cache bank 1 was extremely busy, with lots of pending

invalidates and outstanding accesses, and that the reading CPU's cache bank 0 was idle.

The invalidation corresponding to head->next could then be processed before that of the



three fields. If search() were to be executing just at that time, it would pick up the

new value of head->next, but, since the invalidates corresponding to the three fields

had not yet been processed, it could pick up the old (garbage!) value corresponding to

these fields, possibly resulting in an oops or worse.

Placing an rmb() between the access to head->next and the three fields fixes this

problem. The rmb() forces all outstanding invalidates to be processed before any

subsequent reads are allowed to proceed. Since the invalidate corresponding to the three

fields arrived before that of head->next, this will guarantee that if the new value of

head->next was read, then the new value of the three fields will also be read.

No oopses (or worse).



However, all the rmb()s add complexity, are easy to leave out, and hurt performance of

all architectures. And if you forget a needed rmb(), you end up with very intermittent

and difficult-to-diagnose memory-corruption errors. Just what we don't need in Linux!



So, there is strong motivation for a way of eliminating the need for these rmb()s.

Solutions for lockfree search and insertions



Search and Insert Using wmbdd()



It would much nicer (and faster, on many architectures) to have a primitive similar to

wmb(), but that allowed read-side data dependencies to substitute for an explicit rmb().



It is possible to do this (see patch). With such a primitive, the code looks as follows:



struct el *insert(long key, long data)

{

struct el *p;

p = kmalloc(sizeof(*p), GPF_ATOMIC);

spin_lock(&mutex);

p->next = head.next;

p->key = key;

p->data = data;

wmbdd();

head.next = p;

spin_unlock(&mutex);

}



struct el *search(long key)

{

struct el *p;

p = head.next;

while (p != &head) {

if (p->key == key) {

return (p);

}

p = p->next;

}

return (NULL);

}





This code is much nicer: no rmb()s are required, searches proceed

fully in parallel with no locks or writes, and no intermittent data corruption.



Search and Insert Using read_barrier_depends()



Introduce a new primitive read_barrier_depends() that is defined to be an rmb() on

Alpha, and a nop on other architectures. This removes the read-side performance

problem for non-Alpha architectures, but still leaves the read-side

read_barrier_depends(). It is almost possible for the compiler to do a good job of

generating these (assuming that a "lockfree" gcc struct-field attribute is created

and used), but, unfortunately, the compiler cannot reliably tell when the relevant lock

is held. (If the lock is held, the read_barrier_depends() calls should not be generated.)



After discussions in lkml about this, it was decided that putting an explicit

read_barrier_depends() is the appropriate thing to do in the linux kernel. Linus also

suggested that the barrier names be made more explict. With such a primitive,

the code looks as follows:



struct el *insert(long key, long data)

{

struct el *p;

p = kmalloc(sizeof(*p), GPF_ATOMIC);

spin_lock(&mutex);

p->next = head.next;

p->key = key;

p->data = data;

write_barrier();

head.next = p;

spin_unlock(&mutex);

}



struct el *search(long key)

{

struct el *p;

p = head.next;

while (p != &head) {

read_barrier_depends();

if (p->key == key) {

return (p);

}

p = p->next;

}

return (NULL);

}





A preliminary patch for this is barriers-2.5.7-1.patch. The future releases of this

patch can be found along with the RCU package here.





Other Approaches Considered





Just make wmb() work like wmbdd(), so that data dependencies act as implied rmb()s.

Although wmbdd()'s semantics are much more intuitive, there are a number of uses of

wmb() in Linux that do not require the stronger semantics of wmbdd(), and strengthening

the semantics would incur unnecessary overhead on many CPUs--or require many changes to

the code, and thus a much larger patch.



Just drop support for Alpha. After all, Compaq seems to be phasing it out, right? There

are nonetheless a number of Alphas out there running Linux, and Compaq (or perhaps HP)

will be manufacturing new Alphas for quite a few years to come. Microsoft would likely

focus quite a bit of negative publicity on Linux's dropping support for anything (never

mind that they currently support only two CPU architectures). And the code to make Alpha

work correctly is not all that complex, and it does not impact performance of other CPUs.



Besides, we cannot be 100% certain that there won't be some other CPU lacking a

synchronous invalidation instruction...

http://chatgpt.dhexx.cn/article/Zc14FfmH.shtml

相关文章

mw与dbm换算

1. 基本概念 dbm&#xff1a;意即分贝毫X&#xff0c;可以表示分贝毫伏&#xff0c;或者分贝毫瓦。他是一个表示功率绝对值的单位。 功率/电平&#xff08;dBm&#xff09;&#xff1a;放大器的输出能力&#xff0c;一般单位为w、mw、dBm。dBm是取1mw作基准值&#xff0c;以分贝…

[architecture]-DBG、DMB、DSB 和 ISB指令介绍

快速链接: . &#x1f449;&#x1f449;&#x1f449; 个人博客笔记导读目录(全部) &#x1f448;&#x1f448;&#x1f448; 付费专栏-付费课程 【购买须知】: 【精选】ARMv8/ARMv9架构入门到精通-[目录] &#x1f448;&#x1f448;&#x1f448; 1、DBG、DMB、DSB 和 IS…

WMB在项目中的应用

提纲&#xff1a; 1、 WebSphere Message Broker Introduction a) ESB Overview b) Message Broker Overview c) Message Broker Performance Report 2、 ESB Project Sharing 内容&#xff1a; 1、 Message Broker是建立在MQ基础之上的。【说明消息中间件对于MB是何等的重…

【HTML/CSS】简单登录注册表单制作

实现效果&#xff1a; <!DOCTYPE html> <html lang"en"> <head><meta charset"UTF-8"><meta http-equiv"X-UA-Compatible" content"IEedge"><meta name"viewport" content"widthde…

登录注册弹出框html,jQuery实现弹出窗口中切换登录与注册表单

当点击页面中的登录或注册按钮时,将会弹出一个模态窗口,就是一个弹出层,我们可以在弹出层上轻松的切换登录与注册表单,极大的方便用户,不需要关闭层再去点击转向其他操作,在很多网站上已经广泛应用。 本文结合实例,通过使用jQuery以及CSS3和HTML5技术实现这一效果。 HTM…

html5漂亮的登录与注册界面设计,漂亮的网页登陆/注册表单设计

漂亮的网页登陆/注册表单设计 7月 4, 2012 评论 Sponsor 网页设计中登陆和注册表单是非常常用的&#xff0c;而且使用率也非常高&#xff0c;一个表单的设计其实也不是简单的事情&#xff0c;你要考虑很多用户体验&#xff0c;有的喜欢把注册和登陆都放在一个页面&#xff0c;有…

3.6 用正则表达式验证注册表单页面

用正则表达式验证注册表单页面 制作一张注册页&#xff0c;页面自行设计&#xff0c;页面元素命名自行设置&#xff0c;需要验证如下信息&#xff1a; 用户名&#xff1a;&#xff08;允许2-4个汉字&#xff09; 电话&#xff1a;&#xff08;开头3或4位,”-”号隔开&#xff…

制作一个注册表单页面

制作一个表单注册页面 在Dreamweaver中创建一个.html文件&#xff0c;添加一个11行2列的表格&#xff0c;左侧的内容是手动输入&#xff08;第一行也手动输入&#xff09;&#xff0c;右侧的内容是用代码来写的&#xff08;最后一行也是用代码来写&#xff09;&#xff08;用…

Html+CSS实现简单的注册表单

目录 预览 教程如下 首先新建一个HTML文件 接下来,我们在body标签的内部编写网页的主题内容 新建一个CSS文件 label: input: .submit_btn: CSS 居中显示: 这就结束啦!! 完整代码 html: css: 这次和大家分享一个用html语言实现的一个简单的注册表单,豪华升级版在文…

js实现注册表单验证

js实现注册表单验证 验证用户名必须为&#xff1a;要求6-20位&#xff0c;只能有大小写字母和数字&#xff0c;并且大小写字母和数字都要有 var name_re/[0-9a-zA-Z]{3,8}/; var pwd_re/^\S{6,20}$/; <!DOCTYPE html> <html> <head><meta charset"…

A.2 实验2:注册表单和登录表单

A.2 实验2&#xff1a;注册表单和登录表单 A.2.1目的与要求A.2.2 实验内容 本实验初步创建用于学生注册的模块文件registration.php和用于登录的模块文件login.php。这两个模块文件都保存于教务选课系统项目xk中的“源文件”结点。 本实验主要完成注册表单和登录表单外观的设计…

HTML注册表单的页面制作

效果1&#xff1a;效果2&#xff1a; 效果1&#xff1a; <!DOCTYPE html> <html lang"en"> <head><meta charset"UTF-8"><meta name"viewport" content"widthdevice-width, initial-scale1.0"><tit…

【HTML | CSS | JS】耗时一下午,整理出了一个精美的响应式登陆注册表单(附源码)

&#x1f482;作者简介&#xff1a; THUNDER王&#xff0c;一名热爱财税和SAP ABAP编程以及热爱分享的博主。目前于江西师范大学会计学专业大二本科在读&#xff0c;同时任汉硕云&#xff08;广东&#xff09;科技有限公司ABAP开发顾问。在学习工作中&#xff0c;我通常使用偏后…

html的练习之用户注册表单

这是最终效果图&#xff0c;具体实现代码在下面&#xff0c;一些注意点在代码里有注释&#xff0c;对是新手学习html基础比较友好。 本人刚接触前端学习&#xff0c;这里当作作业的练习&#xff0c;不足之处&#xff0c;欢迎交流&#xff0c;共同进步&#xff01; <!DOCTYPE…

Java Web注册表单编写

编写注册表单 1.要做下列的这样一个表单信息&#xff1a; 2.方法主体里面添加表单 < form action“提交地址” method“post”> 表单内容&#xff08;包括按钮&#xff0c;输入框&#xff0c;选择框等&#xff09;< /form> 我们可以看到在测试中的结果&#xff…

003_用户注册表单【HTML-form表单】

文章目录 一、HTML-form表单1、用户注册表单2、用户注册表单 - 小升级3、HTML5表单新特性① 新的input type 一、HTML-form表单 1、用户注册表单 <html><head><meta charset"UTF-8"><title>HTML用户注册表单</title></head>&l…

javascript实现较全功能注册表单

今天笔者仿照京东注册表单&#xff0c;进行设计了一个表单案例&#xff0c;在这个案例中&#xff0c;可以完成常见表单注册的绝大部分功能&#xff0c;比如表单注册信息的验证&#xff0c;校验用户名&#xff0c;校验密码强弱&#xff0c;注册信息的追踪。这其中用到了正则表达…

HTML5实现注册表单

注册表单代码 <!DOCTYPE html> <html><head><meta charset"utf-8" /><title>注册账号</title></head><body><form action"getSend.php" method"GET"><fieldset><legend>注册…

JQuery用户注册表单验证

使用jquery编写代码实现用户注册表单的验证功能&#xff1a; 用户名、密码、确认密码和手机号不能为空密码长度在8~20位之间&#xff0c;密码至少由数字、字母或下划线其中两种组成确认密码必须和密码一致手机号应该是11位&#xff0c;并且是合法的手机号段验证码随机生成&…

jQuery简单的注册表单

注册表单的实现思想及步骤 明确自己需要收集的注册信息并在Html上写出表单的布局在js上进行数据接收和验证&#xff0c;向服务器发起注册请求&#xff0c;处理请求返回的数据。 代码实现 html代码 <form><div><span>用户名&#xff1a;</span><…