Linux系统调用fsync函数详解

发布时间:2013-11-14 19:55:10   作者:佚名   text-message.png 我要评论

Linux fsync函数主要用于将同步内存中所有已修改的文件数据到储存设备,多用于备份

功能描述:

同步内存中所有已修改的文件数据到储存设备。

用法:

#include

int fsync(int fd);

参数:

fd:文件描述词。

返回说明:

成功执行时,返回0。失败返回-1,errno被设为以下的某个值

EBADF: 文件描述词无效

EIO : 读写的过程中发生错误

EROFS, EINVAL:文件所在的文件系统不支持同步

强制把系统缓存写入文件sync和fsync函数,, fflush和fsync的联系和区别2010-05-10 11:25传统的U N I X实现在内核中设有缓冲存储器,大多数磁盘I / O都通过缓存进行。当将数据写

到文件上时,通常该数据先由内核复制到缓存中,如果该缓存尚未写满,则并不将其排入输出

队列,而是等待其写满或者当内核需要重用该缓存以便存放其他磁盘块数据时,再将该缓存排

入输出队列,然后待其到达队首时,才进行实际的I / O操作。这种输出方式被称之为延迟写

(delayed write)(Bach 〔1 9 8 6〕第3章详细讨论了延迟写)。延迟写减少了磁盘读写次数,但是

第4章文件和目录8 7

下载

却降低了文件内容的更新速度,使得欲写到文件中的数据在一段时间内并没有写到磁盘上。当

系统发生故障时,这种延迟可能造成文件更新内容的丢失。为了保证磁盘上实际文件系统与缓

存中内容的一致性,U N I X系统提供了s y n c和f s y n c两个系统调用函数。

#include

void sync(void);

int fsync(intf i l e d e s) ;

返回:若成功则为0,若出错则为-1

s y n c只是将所有修改过的块的缓存排入写队列,然后就返回,它并不等待实际I / O操作结束。

系统精灵进程(通常称为u p d a t e )一般每隔3 0秒调用一次s y n c函数。这就保证了定期刷新内

核的块缓存。命令s y n c ( 1 )也调用s y n c函数。

函数f s y n c只引用单个文件(由文件描述符f i l e d e s指定),它等待I / O结束,然后返回。f s y n c可

用于数据库这样的应用程序,它确保修改过的块立即写到磁盘上。比较一下f s y n c和O _ S Y N C标

志(见3 . 1 3节)。当调用f s y n c时,它更新文件的内容,而对于O _ S Y N C,则每次对文件调用w r i t e

函数时就更新文件的内容。

fflush和fsync的联系和区别

[zz ] http://blog.chinaunix.net/u2/73874/showart_1421917.html

1.提供者fflush是libc.a中提供的方法,fsync是系统提供的系统调用。2.原形fflush接受一个参数FILE *.fflush(FILE *);fsync接受的时一个Int型的文件描述符。fsync(int fd);3.功能fflush:是把C库中的缓冲调用write函数写到磁盘[其实是写到内核的缓冲区]。fsync:是把内核缓冲刷到磁盘上。

c库缓冲-----fflush---------〉内核缓冲--------fsync-----〉磁盘

再转一篇英文的

Write-back support

UBIFS supports write-back, which means that file changes do not go to the flash media straight away, but they are cached and go to the flash later, when it is absolutely necessary. This helps to greatly reduce the amount of I/O which results in better performance. Write-back caching is a standard technique which is used by most file systems like ext3 or XFS.

In contrast, JFFS2 does not have write-back support and all the JFFS2 file system changes go the flash synchronously. Well, this is not completely true and JFFS2 does have a small buffer of a NAND page size (if the underlying flash is NAND). This buffer contains last written data and is flushed once it is full. However, because the amount of cached data are very small, JFFS2 is very close to a synchronous file system.

Write-back support requires the application programmers to take extra care about synchronizing important files in time. Otherwise the files may corrupt or disappear in case of power-cuts, which happens very often in many embedded devices. Let's take a glimpse at Linux manual pages:

$ man 2 write

....

NOTES

A successful return from write() does not make any guarantee that data

has been committed to disk. In fact, on some buggy implementations, it

does not even guarantee that space has successfully been reserved for

the data. The only way to be sure is to call fsync(2) after you are

done writing all your data.

...

This is true for UBIFS (except of the "some buggy implementations" part, because UBIFS does reserves space for cached dirty data). This is also true for JFFS2, as well as for any other Linux file system.

However, some (perhaps not very good) user-space programmers do not take write-back into account. They do not read manual pages carefully. When such applications are used in embedded systems which run JFFS2 - they work fine, because JFFS2 is almost synchronous. Of course, the applications are buggy, but they appear to work well enough with JFFS2. But the bugs show up when UBIFS is used instead. Please, be careful and check/test your applications with respect to power cut tolerance if you switch from JFFS2 to UBIFS. The following is a list of useful hints and advices.

If you want to switch into synchronous mode, use the -o sync option when mounting UBIFS; however, the file system performance will drop - be careful; Also remember that UBIFS mounted in synchronous mode provides less guarantees than JFFS2 - refer this section for details.

Always keep in mind the above statement from the manual pages and run fsync() for all important files you change; of course, there is no need to synchronize "throw-away" temporary files; Just think how important is the file data and decide; and do not use fsync() unnecessarily, because this will hit the performance;

If you want to be more accurate, you may use fdatasync(), in which cases only data changes will be flushed, but not inode meta-data changes (e.g., "mtime" or permissions); this might be more optimal than using fsync() if the synchronization is done often, e.g., in a loop; otherwise just stick with fsync();

In shell, the sync command may be used, but it synchronizes whole file system which might be not very optimal; and there is a similar libc sync() function;

You may use the O_SYNC flag of the open() call; this will make sure all the data (but not meta-data) changes go to the media before the write() operation returns; but in general, it is better to use fsync(), because O_SYNC makes each write to be synchronous, while fsync() allows to accumulate many writes and synchronize them at once;

It is possible to make certain inodes to be synchronous by default by setting the "sync" inode flag; in a shell, the chattr +S command may be used; in C programs, use the FS_IOC_SETFLAGS ioctl command; Note, the mkfs.ubifs tool checks for the "sync" flag in the original FS tree, so the synchronous files in the original FS tree will be synchronous in the resulting UBIFS image.

Let us stress that the above items are true for any Linux file system, including JFFS2.

fsync() may be called for directories - it synchronizes the directory inode meta-data. The "sync" flag may also be set for directories to make the directory inode synchronous. But the flag is inherited, which means all new children of this directory will also have this flag. New files and sub-directories of this directory will also be synchronous, and their children, and so forth. This feature is very useful if one needs to create a whole sub-tree of synchronous files and directories, or to make all new children of some directory to be synchronous by default (e.g., /etc).

The fdatasync() call for directories is "no-op" in UBIFS and all UBIFS operations which change directory entries are synchronous. However, you should not assume this for portability (e.g., this is not true for ext2). Similarly, the "dirsync" inode flag has no effect in UBIFS.

The functions mentioned above work on file-descriptors, not on streams (FILE *). To synchronize a stream, you should first get its file descriptor using the fileno() libc function, then flush the stream using fflush(), and then synchronize the file using fsync() or fdatasync(). You may use other synchronization methods, but remember to flush the stream before synchronizing the file. The fflush() function flushes the libc-level buffers, while sync(), fsync(), etc flush kernel-level buffers.

Please, refer this FAQ entry for information about how to atomically update the contents of a file. Also, the Theodore Tso's article is a good reading.

Write-back knobs in Linux

Linux has several knobs in "/proc/sys/vm" which you may use to tune write-back. The knobs are global, so they affect all file-systems. Please, refer the "Documentation/sysctl/vm.txt" file fore more information. The file may be found in the Linux kernel source tree. Below are interesting knobs described in UBIFS context and in a simplified form.

dirty_writeback_centisecs - how often the Linux periodic write-back thread wakes up and writes out dirty data. This is a mechanism which makes sure all dirty data hits the media at some point.

dirty_expire_centisecs - dirty data expire period. This is maximum time data may stay dirty. After this period of time it will be written back by the Linux periodic write-back thread. IOW, the periodic write-back thread wakes up every "dirty_writeback_centisecs" centi-seconds and synchronizes data which was dirtied "dirty_expire_centisecs" centi-seconds ago.

dirty_background_ratio - maximum amount of dirty data in percent of total memory. When the amount of dirty data becomes larger, the periodic write-back thread starts synchronizing it until it becomes smaller. Even non-expired data will be synchronized. This may be used to set a "soft" limit for the amount of dirty data in the system.

dirty_ratio - maximum amount of dirty data at which writers will first synchronize the existing dirty data before adding more. IOW, this is a "hard" limit of the amount of dirty data in the system.

Note, UBIFS additionally has small write-buffers which are synchronized every 3-5 seconds. This means that most of the dirty data are delayed by dirty_expire_centisecs centi-seconds, but the last few KiB are additionally delayed by 3-5 seconds.

UBIFS write-buffer

UBIFS is asynchronous file-system (read this section for more information). As other Linux file-system, it utilizes the page cache. The page cache is a generic Linux memory-management mechanism. It may be very large and cache a lot of data. When you write to a file, the data are written to the page cache, marked as dirty, and the write returns (unless the file is synchronous). Later the data are written-back.

Write-buffer is an additional UBIFS buffer, which is implemented inside UBIFS, and it sits between the page cache and the flash. This means that write-back actually writes to the write-buffer, not directly to the flash.

The write-buffer is designated to speed-up UBIFS on NAND flashes. NAND flashes consist of NAND pages, which are usually 512, 2KiB or 4KiB in size. NAND page is the minimal read/write unit of NAND flash (see this section).

Write-buffer size is equivalent to NAND page size (so it is tiny comparing to the page cache). It's purpose is to accumulate small writes, and write full NAND pages instead of partially filled. Indeed, imagine we have to write 4 512-byte nodes with half a second interval, and NAND page size is 2KiB. Without write-buffer we would have to write 4 NAND pages and waste 6KiB of flash space, while write-buffer allows us to write only once and waste nothing. This means we write less, we create less dirty space so UBIFS garbage collector will have to do less work, we save power.

Well, the example shows an ideal situation, and even with the write-buffer we may waste space, for example in case of synchronous I/O, or if the data arrives with long time intervals. This is because the write-buffer has an associated timer, which flushes it every 3-5 seconds, even if it isn't full. We do this for data integrity reasons.

Of course, when UBIFS has to write a lot of data, it does not use write buffer. Only the last part of the data which is smaller than the NAND page ends up in the write-buffer and waits more for data, until it is flushed by the timer.

The write-buffer implementation is a little more complex, and we actually have several of them - one for each journal head. But this does not change the basic idea behind the write-buffer.

Few notes with regards to synchronization:

"sync()" also synchronizes all write-buffers;

"fsync(fd)" also synchronizes all write-buffers which contain pieces of "fd";

synchronous files, as well as files opened with "O_SYNC", bypass write-buffers, so the I/O is indeed synchronous for this files;

write-buffers are also bypassed if the file-system is mounted with the "-o sync" mount option.

Take into account that write-buffers delay the data synchronization timeout defined by "dirty_expire_centisecs" (see here) by 3-5 seconds. However, since write-buffers are small, only few data are delayed.

UBIFS in synchronous mode vs JFFS2

When UBIFS is mounted in synchronous mode (-o sync mount options) - all file system operations become synchronous. This means that all data are written to flash before the file-system operations return.

For example, if you write 10MiB of data to a file f.dat using the write() call, and UBIFS is in synchronous mode, then UBIFS guarantees that all 10MiB of data and the meta-data (file size and date changes) will reach the flash media before write() returns. And if a power cut happens after the write() call returns, the file will contain the written data.

The same is true for situations when f.dat has was opened with O_SYNC or has the sync flag (see man 2 chattr).

It is well-known that the JFFS2 file-system is synchronous (except a small write-buffer). However, UBIFS in synchronous mode is not the same as JFFS2 and provides somewhat less guarantees that JFFS2 does with respect to sudden power cuts.

In JFFS2 all the meta-data (like inode atime/mtime/ctime, inode size, UID/GID, etc) are stored in the data node headers. Data nodes carry 4KiB of (compressed) data. This means that the meta-data information is duplicated in many places, but this also means that every time JFFS2 writes a data node to the flash media, it updates inode size as well. So when JFFS2 mounts it scans the flash media, finds the latest data node, and fetches the inode size from there.

In practice this means that JFFS2 will write these 10MiB of data sequentially, from the beginning to the end. And if you have a power cut, you will just lose some amount of data at the end of the inode. For example, if JFFS2 starts writing those 10MiB of data, write 5MiB, and a power cut happens, you will end up with a 5MiB f.dat file. You lose only the last 5MiB.

Things are a little bit more complex in case of UBIFS, where data are stored in data nodes and meta-data are stored in (separate) inode nodes. The meta-data are not duplicated in each data node, like in JFFS2. UBIFS never writes data nodes beyond the on-flash inode size. If it has to write a data node and the data node is beyond the on-flash inode size (the in-memory inode has up-to-data size, but it is dirty and was not flushed yet), then UBIFS first writes the inode to the media, and then it starts writing the data. And if you have an interrupt, you lose data nodes and you have holes (or old data nodes, if you are overwriting). Lets consider an example.

User creates an empty file f.dat. The file is synchronous, or UBIFS is mounted in synchronous mode. User calls the write() function with a 10MiB buffer.

The kernel first copies all 10MiB of the data to the page cache. Inode size is changed to 10MiB as well and the inode is marked as dirty. Nothing has been written to the flash media so far. If a power cut happens at this point, the user will end up with an empty f.dat file.

UBIFS sees that the I/O has to be synchronous, and starts synchronizing the inode. First of all, it writes the inode node to the flash media. If a power cut happens at this moment, the user will end up with a 10MiB file which contains no data (hole), and if he read this file, he will get 10MiB of zeroes.

UBIFS starts writing the data. If a power cut happens at this point, the user will end up with a 10MiB file containing a hole at the end.

Note, if the I/O was not synchronous, UBIFS would skip the last step and would just return. And the actual write-back would then happen in back-ground. But power cuts during write-back could anyway lead to files with holes at the end.

Thus, synchronous I/O in UBIFS provides less guarantees than JFFS2 I/O - UBIFS has an effect of holes at the end of files. In ideal world applications should not assume anything about the contents of files which were not synchronized before a power-cut has happened. And "mainstream" file-systems like ext3 do not provide JFSS2-like guarantees.

However, UBIFS is sometimes used as a JFFS2 replacement and people may want it to behave the same way as JFFS2 if it is mounted synchronously. This is doable, but needs some non-trivial development, so this was not implemented so far. On the other hand, there was no strong demand. You may implement this as an exercise, or you may try to convince UBIFS authors to do this.

Synchronization exceptions for buggy applications

As this section describes, UBIFS is an asynchronous file-system, and applications should synchronize their files whenever it is required. The same applies to most Linux file-systems, e.g. XFS.

However, many applications ignore this and do not synchronize files properly. And there was a huge war between user-space and kernel developers related to ext4 delayed allocation feature. Please, see the Theodore Tso's blog post. More information may be found in this LWN article.

In short, the flame war was about 2 cases. The first case was about the atomic re-name, where many user-space programs did not synchronize the copy before re-naming it. The second case was about applications which truncate files, then change them. There was no final agreement, but the "we cannot ignore the real world" argument found ext4 developers' understanding, and there were 2 ext4 changes which help both problems.

Roughly speaking, the first change made ext4 synchronize files on close if they were previously truncated. This was a hack from file-system point of view, but it "fixed" applications which truncate files, write new contents, and close the files without synchronizing them.

The second change made ext4 synchronize the renamed file.

Well, this is not exactly correct description, because ext4 does not write the files synchronously, but actually initiates asynchronous write-out of the files, so the performance hit is not very high. For the truncation case this means that the file is synchronized soon after it is closed. For the re-name case this means that ext4 writes data before it writes the re-name meta-data.

However, the application writers should never rely on these things, because this is not portable. Instead, they should properly synchronize files. The ext4 fixes were because there were many broken user-space applications in the wild already.

We have plans to implement these features in UBIFS, but this has not been done yet. The problem is that UBI/MTD are fully synchronous and we cannot initiate asynchronous write-out, so we'd have to synchronously write files on close/rename, which is slow. So implementing these features would require implementing asynchronous I/O in UBI, which is a big job. But feel free to do this :-).

相关文章

deepin20怎么使用画板打开图片?deepin20系统想要打开图片,该怎么使用画板打开图片呢?下面我们就来看看deepin画板打开图片的两种方法,需要的朋友可以参考下2020-10-03

deepin20默认浏览器怎么设置?eepin20系统想要设置默认浏览器,该怎么设置呢?下面我们就来看看deepin添加默认浏览器的技巧,需要的朋友可以参考下2020-10-03

deepin系统怎么设置屏幕分辨率?deepin系统不是很清晰,想要设置分辨率,该怎么设置分辨率呢?下面我们就来看看deepin分辨率的设置方法,需要的朋友可以参考下2020-09-29

deepin20系统字体怎么设置?deepin20系统字体太小,想要设置大一些,该怎么调整字体的大小呢?下面我们就来看看deepin终端字体大小的设置方法,需要的朋友可以参考下2020-09-27

鼠标怎么设置为左手?deepin20系统中,默认鼠标是右手模式,但是用习惯了左手,想要设置为左手模式,该怎么操作呢?下面我们就来看看deepin20左手鼠标设置方法,需要的朋友2020-09-25

cpu主频怎么看?想要查看cpu主频,在windows系统下很方便查看,那么deepin20系统该怎么操作呢?我们今天就来介绍两种deepin20查看CPU主频的技巧,需要的朋友可以参考下2020-09-24

deepin20网关地址怎么设置?deepin20系统想要设置网关地址,该怎么设置呢?下面我们就来看看deepin网关地址的修改方法,需要的朋友可以参考下2020-09-23

deepin20怎么设置图标的排列方式?deepin20中想要设置图标的显示方式,有列表视图和图标视图,该怎么设置为图标视图呢?下面我们就来看看deepin图标视图设置方法,需要的朋2020-09-23

deepin20窗口最小化魔灯效果怎么设置?deepin20系统窗口最小化是有一些特效的,想要添加魔灯效果,该怎么实现呢?下面我们就来看看详细的教程,需要的朋友可以参考下2020-09-22

deepin20任务栏透明度怎么设置?电脑任务栏透明度是可以调节的,该怎么调节呢?下面我们就来看看deepin调整任务栏透明度的技巧,需要的朋友可以参考下2020-09-21

最新评论

linux 系统函数调用脚本文件,Linux系统调用fsync函数详解相关推荐

  1. linux磁盘同步函数,Linux系统调用fsync函数详解

    功能描述: 同步内存中所有已修改的文件数据到储存设备. 用法: #include int fsync(int fd); 参数: fd:文件描述词. 返回说明: 成功执行时,返回0.失败返回-1,err ...

  2. linux Shell(脚本)编程入门实例讲解详解

    linux Shell(脚本)编程入门实例讲解详解 为什么要进行shell编程 在Linux系统中,虽然有各种各样的图形化接口工具,但是sell仍然是一个非常灵活的工具.Shell不仅仅是命令的收集, ...

  3. linux cp -r 参数,Linux系统中cp命令的参数及用法详解

    Linux系统中cp命令主要是用来复制文件或者目录.下面由学习啦小编为大家整理了Linux系统中cp命令的参数及用法详解的相关知识,希望对大家有帮助! Linux系统中cp命令的参数及用法详解:参数说 ...

  4. Linux系统下安装rz/sz命令及使用说明(详解)

    Linux系统下安装rz/sz命令及使用说明(详解) 对于经常使用Linux系统的人员来说,少不了将本地的文件上传到服务器或者从服务器上下载文件到本地,rz / sz命令很方便的帮我们实现了这个功能, ...

  5. linux系统密码输入快捷,linux 系统忘记密码的快捷解决方法(图文详解)

    linux 系统忘记密码的快捷解决方法(图文详解) 在学习Linux的过程当中,想到,万一自己忘记了linux的密码该怎么办? 其实,在linux当中,只用简单的操作几步,就可以更改用户密码~下面小编 ...

  6. linux系统间拷贝文件,Linux系统下不同机器之间拷贝文件的方法

    在Linux系统下,不同机器上实现文件拷贝 一.将本地文件拷贝到远程机器: scp /home/administrator/news.txt root@192.168.6.129:/etc/squid ...

  7. Linux系统.xsesion日志文件,linux系统日志

    ##日志记录系统每天发生的各种各样的事情,比如监控系统的状况,排查系统的故障等.你可以通过日志来检查错误发生的原因,或者受到***时留下的痕迹.日志的主要功能是审计和监测,还有实时的监测系统状态,监测 ...

  8. 如何在linux系统写程序文件,Linux应用程序使用写文件调试程序的方法

    Linux,一切皆文件,那么在Android系统本身,也是Linux+java罢了,也是在Linux的运行环境下. 通常,我们在调试程序的都会使用printf. 在Android中,我们会去使用log ...

  9. linux系统shell脚本编程,Linux系统shell脚本编程(一)

    哈喽,大家好,我是Adam.前面我发了不少关于linux 的文章,今天也是一样,将分几篇文章系统化的讲讲shell脚本编程.废话不多说,走起走起!        首先说一下shell脚本是什么.简单来 ...

最新文章

  1. SpringBoot 集成 WebSocket,实现后台向前端推送信息
  2. 获取网络状态ConnectivityManager
  3. anaconda换源和恢复默认源
  4. 100%由清洁能源供电的数据中心什么样?苹果丹麦维堡数据中心投入使用
  5. mysql行级安全_MySQL学习笔记(五):MySQL表级锁和行级锁
  6. linux mysql 5.7 双机热备_2017年5月5日 星红桉liunx动手实践mysql 主主双机热备
  7. html5 移动 优化,第四天:HTML5移动站优化技巧 摘自《10天学会移动站SEO》
  8. bootstrap3 表单构建器_FastReport.NET报表设计器连接到OracleDB关系数据库
  9. 带余除法(信息学奥赛一本通-T1009)
  10. Linux+Tomcat建站笔记(JDK,Mysql,Vsftpd,Iptables等配置)
  11. 转转集团二手双11大促:长沙用户“秒杀”99新iPhone12成首单
  12. python 列表 元祖 字典 集合_python中列表、元祖、字典和集合
  13. React+TS免注册DOM页面dialog弹窗
  14. 翁恺老师的细胞自动机
  15. 迪普交换机恢复出厂设置_迪普产品配置文档-基础篇(2012-11-05).pdf
  16. 炼丹笔记三:数据增强
  17. 英语SouthRedAgate南红玛瑙southredagate单词
  18. python设计石头剪刀布游戏_用python制作剪刀石头布游戏
  19. canonical是什么意思
  20. 计算机毕业设计基于asp.net校园二手物品交易平台

热门文章

  1. android 百度地图 驾车路径的距离获取
  2. Bootstrap系列之-FileInput中文API整理
  3. 技术创新驱动销售 植宗山茶油首登排行榜
  4. Python量化开源框架、库
  5. 南陵中学2021高考成绩查询,南陵中学2020年高考快讯(一)
  6. 解决Office PowerPoint 2007 输入汉字卡死
  7. 拷贝构造函数的类型为什么必须使用引用类型
  8. OCR 技术总结和汇总
  9. setInterval 和 setTimeout 用法
  10. 011-2018-09-17 迭代器和闭包