面试时被问到Linux的tail命令是怎么实现的,被问时一脸懵逼,凭着想象答了一下,面试官听完后就挂掉了面试。后来看了下源码,大概知道了一点原理。
Linux的tail命令的功能简单来说就是输出文件的最后n行,如果不指定n则默认是最后10行。源码里tail的核心函数如下:链接:tail.csrc - coreutils.git - GNU coreutils
/* Print the last N_LINES lines from the end of file FD.Go backward through the file, reading 'BUFSIZ' bytes at a time (exceptprobably the first), until we hit the start of the file or haveread NUMBER newlines.START_POS is the starting position of the read pointer for the fileassociated with FD (may be nonzero).END_POS is the file offset of EOF (one larger than offset of last byte).Return true if successful. */static bool
file_lines (char const *pretty_filename, int fd, uintmax_t n_lines,off_t start_pos, off_t end_pos, uintmax_t *read_pos)
{char buffer[BUFSIZ];size_t bytes_read;off_t pos = end_pos;if (n_lines == 0)return true;/* Set 'bytes_read' to the size of the last, probably partial, buffer;0 < 'bytes_read' <= 'BUFSIZ'. */bytes_read = (pos - start_pos) % BUFSIZ;if (bytes_read == 0)bytes_read = BUFSIZ;/* Make 'pos' a multiple of 'BUFSIZ' (0 if the file is short), so that allreads will be on block boundaries, which might increase efficiency. */pos -= bytes_read;xlseek (fd, pos, SEEK_SET, pretty_filename);bytes_read = safe_read (fd, buffer, bytes_read);if (bytes_read == SAFE_READ_ERROR){error (0, errno, _("error reading %s"), quoteaf (pretty_filename));return false;}*read_pos = pos + bytes_read;/* Count the incomplete line on files that don't end with a newline. */if (bytes_read && buffer[bytes_read - 1] != line_end)--n_lines;do{/* Scan backward, counting the newlines in this bufferfull. */size_t n = bytes_read;while (n){char const *nl;nl = memrchr (buffer, line_end, n);if (nl == NULL)break;n = nl - buffer;if (n_lines-- == 0){/* If this newline isn't the last character in the buffer,output the part that is after it. */xwrite_stdout (nl + 1, bytes_read - (n + 1));*read_pos += dump_remainder (false, pretty_filename, fd,end_pos - (pos + bytes_read));return true;}}/* Not enough newlines in that bufferfull. */if (pos == start_pos){/* Not enough lines in the file; print everything fromstart_pos to the end. */xlseek (fd, start_pos, SEEK_SET, pretty_filename);*read_pos = start_pos + dump_remainder (false, pretty_filename, fd,end_pos);return true;}pos -= BUFSIZ;xlseek (fd, pos, SEEK_SET, pretty_filename);bytes_read = safe_read (fd, buffer, BUFSIZ);if (bytes_read == SAFE_READ_ERROR){error (0, errno, _("error reading %s"), quoteaf (pretty_filename));return false;}*read_pos = pos + bytes_read;}while (bytes_read > 0);return true;
}
从上面的源码可以看到,tail是先用xlseek
定位到了文件末尾(L29),然后从文件末尾每次读取BUFSIZ
个字符buffer
(L30, L78)。之后从buffer
中统计line_end
(也就是换行符)的个数(L47~63)。每次读完BUFSIZ
个字符后,都需要lseek到上一个BUFSIZE
的开头(L75~76),直至统计到n_lines
个后就写到输出流返回(L54~62)。
可以看到,整个过程不需要将整个文件读取到内存中,就是用了大量的xlseek
(也就是lseek
的封装)。lseek的原理在这里不再赘述,其功能是记录当前fd的读取位置,跳到指定的offset处。
这个函数的输入中有参数start_pos
和end_pos
,指代文件的起始和终止位置,这个也是用lseek函数找到的,在外层函数中有体现:tail_lines。
tail还有一个功能,就是加上-f
(或者tailf
),这样可以一直看文件中新增的内容。那这是怎么实现的呢。
其实说起原理也很简单,就是通过inotify
或者轮询的机制去检测文件是否有新增。inotify是Linux2.6.13后引入的文件事件监听系统,可以直接用来监听文件的相关事件,如果检测到文件发生改变,则输出新增内容;如果Linux没有inotify,则会轮询文件有无更新,检测到没更新则sleep 1s再查看。tail -f 的相关代码如下,其中涉及到许多Linux文件系统的用法,在此就不再进行深度剖析了。
inotify方法实现:tail_forever_inotify
轮询方法实现:tail_forever
本文发布于:2024-02-05 04:31:32,感谢您对本站的认可!
本文链接:https://www.4u4v.net/it/170724215863059.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
留言与评论(共有 0 条评论) |