国产毛片a精品毛-国产毛片黄片-国产毛片久久国产-国产毛片久久精品-青娱乐极品在线-青娱乐精品

電子工程網

標題: 如何看懂man page？（轉） [打印本頁]

作者: linux_Ultra 時間: 2009-6-30 09:28
標題: 如何看懂man page？（轉）
看懂man page是做Linux開發最基本的要求，然而很多新手非常不喜歡看man page，我們在教
學中發現，雖然從第一天講編程就開始強調一定要看man page，rtfm=read the f*cking
manual，但結果是很多學生都想方設法繞過看man page，一個月以后，從沒來仔細看過一個
man page的學生仍然有半數以上。

比如有一本《Linux常用C函數（中文版）》就是學生們的最愛，雖然我們從來沒有推薦過也
沒有提供過這本書的電子版或印刷版，但是學生幾乎人手一份。這本書的風格和man page截
然不同，函數接口的說明非常簡略，遠遠沒有涵蓋man page的要點，然而每個函數后面都不
厭其煩地舉一個例子，即使這個函數的用法已經像禿頭上的虱子那么明顯了也要舉個例子，
而且通常這個例子寫得極不規范，例如從來不判斷出錯返回值。讓我說，這本書就是垃圾，
這本書的存在不僅浪費空間，而且害人不淺。適合新手速查是沒有錯，但人都是有惰性的，
新手往往都會依賴上這本書，不用去看man page，也不想去看，看man page干嗎？東拉西扯
說了那么多，費半天勁也看不懂，而且最后連個例子都沒有，看完還是不知道怎么調用這個
函數，哪有看這本書學得輕松，連字都不用看，直接把例子粘貼到自己的代碼中就行了。

新手就這樣被毒害了：第一，剛才說了，這些例子極不規范，bug很多，就是垃圾代碼，誰用
了它誰的代碼也就成了垃圾代碼；第二，說明得太簡略，容易讓人產生片面理解和誤解。第
三，助長了新手的惰性，雖然靠這本書能寫出很多程序，但英文能力、理解能力和技術水平
都長期停滯不前，根本不能算是學習提高了；第四，這本書畢竟只介紹了數量有限的C函數，
實際工作中當然會用到很多書上沒有的函數，本來看看man page就會用了，但是新手們已經
離不開這本書了，必然會想一些湊合應付的辦法，用書上有的函數代替書上沒有的函數去應
付工作。就這樣，這本masterpiece培養出了一大批合格的垃圾代碼制造者。

還有一本《Linux C函數庫詳解詞典》也是這一類書的典型代表，和上面說的那本大同小異。
扯點離題的話，我有一個更極端的觀點：任何給程序員看的文檔都不應該翻譯成中文，因為
不具備流暢地閱讀英文的能力就不是一個合格的程序員，應該先去學好英文再學編程，更何
況翻譯總會引入新的錯誤和不準確，使文檔的質量下降。只有給用戶看的文檔才應該翻譯成
中文，因為不能要求用戶達到多高的水平才可以使用這個軟件。

把難理解的、難掌握的都回避了，把本來很復雜的man page閹割了之后再去教給新手，讓他
們以為掌握技術就是這么簡單，一書在手，萬事不愁，這根本不算是教育。真正的教育不應
該回避任何復雜性，而應該是舉一反三，把一個復雜的問題給學生分析透了，然后啟發學生
自己去解決其它的復雜問題。下面我來仔細剖析一個man page，通過這一個例子說明man
page的行文中存在的普遍規律，說明應該如何理解一個man page，以達到舉一反三的目的，
我相信我這一篇文章比以上兩本爛書對新手更為有用。

這是POSIX規范中正則表達式的C函數的man page，讀者要用這些函數首先要對正則表達式的
概念非常清晰，知道正則表達式能用來干什么，不能用來干什么，要干的話怎么干，并且能
夠很熟練地寫出正則表達式來，每個man page都是高度cohesive的，不會教你這些偏離主題
的東西。也就是說，首先你期望要用這些函數完成什么工作必須非常清楚，如果自己都不知
道自己要干什么，man page是幫不了你的。

作者: linux_Ultra 時間: 2009-6-30 09:28
1. REGEX(3)                Linux Programmer’s Manual                REGEX(3)
2.
3. NAME
4.       regcomp, regexec, regerror, regfree - POSIX regex functions
5.
6. SYNOPSIS
7.       #include
8.       #include
9.
  10.       int regcomp(regex_t *preg, const char *regex, int cflags);
  11.
  12.       int regexec(const regex_t *preg, const char *string, size_t nmatch,
  13.                   regmatch_t pmatch[], int eflags);
  14.
  15.       size_t regerror(int errcode, const regex_t *preg, char *errbuf,
  16.                      size_t errbuf_size);
  17.
  18.       void regfree(regex_t *preg);

作者: linux_Ultra 時間: 2009-6-30 09:30
這個man page描述了四個函數的用法。本來我只是想用一個正則表達式匹配一個字符串，并取得
匹配結果，也就是說我想要的是這樣一個函數：

C代碼

1. int my_expect_func(傳入：正則表達式, 傳入：目標字符串, 傳出：匹配結果);
2. 返回：錯誤碼

int my_expect_func(傳入：正則表達式, 傳入：目標字符串, 傳出：匹配結果);
返回：錯誤碼

怎么會有四個函數呢？哪個跟我想要的函數最相關？其它函數又是做什么的？這是一個好的
閱讀習慣：你要主動去猜測，而不是被動地接受信息。理解的過程應該是拿你的猜測
去和文字描述相比較，如果相符就說明理解對了，如果不符就要提出一個新的猜測去比較，
完全被動地接受信息那不叫理解。

傳入參數和傳出參數是一個重要的提示，Linux的庫函數原型都是非常規范的，const指針一
定是傳入參數，非const指針一定有傳出值（可能是傳出參數，也可能是傳入-傳出參數），
所以，函數原型就已經非常清楚地告訴你應該怎么調用這個函數了，根本沒必要給出代碼例
子。看第一個函數：

C代碼

1. int regcomp(regex_t *preg, const char *regex, int cflags);

   int regcomp(regex_t *preg, const char *regex, int cflags);

preg是傳出參數，需要事先分配該對象的內存然后把地址傳給regcomp函數，regex是傳入參
數，cflags是標志位，preg不知道是什么，但regex就是regular expression，正則表達式，
又是char *型的，應該沒錯了，不用看下面的說明就可以猜測這個函數是這樣調用的：

C代碼

1. regex_t regobj;
2. regcomp(®obj, "正則表達式", 標志位1|標志位2|...);

regex_t regobj;
regcomp(®obj, "正則表達式", 標志位1|標志位2|...);

再強調一遍，要想理解一段文字，就要充分調動經驗和推理，主動去猜測，然后看下文驗證
你的猜測，而不是被動接受信息。怎么推理呢？以上函數傳入一個正則表達式，指定幾個標
志，傳出一個值，應該是把正則表達式轉換格式了吧？這就叫推理。相反，如果我根本不管
preg是一個傳出參數，而且也不是字符串型的，非要往my_expect_func的形式上套，既然
regex參數是正則表達式，那么preg參數就應該是目標字符串，這就不叫推理和猜測，叫瞎蒙。

作者: linux_Ultra 時間: 2009-6-30 09:32
如果對正則表達式的機理有一定了解，就可以借助這個經驗猜到這個函數大概是把正則表達
式字符串轉換成狀態機以便高效地匹配目標字符串。如果以前用過其它編程語言的正則表達
式庫函數，也可以借助這些經驗知道正則表達式在使用之前大多有一個預處理的步驟。另
外，對英文縮寫要有一定敏感性，函數名是regcomp，reg就是正則表達式，comp是compare還
是compile？如果是compare，那應該有兩個相同類型的參數來做比較，就像strcmp，這里顯
然是compile，編譯，把字符串形式轉為二進制形式，從另一個側面也驗證了前面的猜測。這
些都是靠經驗而不是推理得到的，經驗有助于更快更準確地理解，但不是必須的，因為事實
上我們通過上面基于傳入傳出參數的推理已經猜出正確結論了，只不過有經驗的人會對自己
的猜測更自信。

對英文縮寫敏感是看man page和看代碼需要具備的最基本的能力，但這需要長期的練習才能
找到感覺。也許你要學會一個函數怎么用并不必知道函數名和各個參數名是什么的縮寫，你
通過以上列舉的兩本爛書就可以學會怎么用，但如果總是回避man page，總是不去做猜縮寫
的練習，就不可能看懂別人的代碼，不看別人的代碼就自己亂寫代碼，連變量名該怎么起都
不知道，寫出來的永遠是垃圾代碼。對于regcomp這個函數名以及各參數名，regex是
regular expression，regcomp是regular expression compile。那么preg是什么？reg是
regular expression，p表示什么呢？表示指針？那是微軟的infamous的hungarian
notation，Linux上肯定不是這么用的，這里的p我猜是precompiled。cflags的c是什么？不
知道，但是跟下面一個函數對比來看：

C代碼

1. int regexec(const regex_t *preg, const char *string, size_t nmatch,
2.          regmatch_t pmatch[], int eflags);

   int regexec(const regex_t *preg, const char *string, size_t nmatch,
               regmatch_t pmatch[], int eflags);

這個函數有個參數叫eflags。所以c是regcomp的c，而e是regexec的e，一個是編譯時的
flags，一個是執行時的flags，這兩種flags的取值必然不同，下文必然會分別說明。這又是
一種猜測：猜測下文的行文邏輯。這種猜測同樣是非常有助于理解的。后面幾個函數的函數
名和參數名是怎么縮寫的，留給讀者自己練習。

preg參數在regcomp中是傳出參數，在regexec中卻是傳入參數，根據推理，preg是由
regcomp函數填寫好之后傳給regexec函數用的，也就是說正則表達式以轉換之后的二進制格
式傳給regexec函數來用。regexec又有一個字符串傳入參數string，還有兩個match參數表示
匹配結果，pmatch是傳出參數，表示緩沖區首地址，nmatch表示緩沖區長度（根據經驗，這
類似于strncpy），這必然就是我一開始想要的my_expect_func了：

C代碼

1. int my_expect_func(傳入：正則表達式, 傳入：目標字符串, 傳出：匹配結果);
2. 返回：錯誤碼

int my_expect_func(傳入：正則表達式, 傳入：目標字符串, 傳出：匹配結果);
返回：錯誤碼

preg對應正則表達式，pmatch和nmatch對應匹配結果，因此string這個傳入參數必然是目標
字符串了。pmatch是一個指針變量，但是寫成pmatch[]，說明它指向的是一組而不是一個
regmatch_t類型的對象，這一組有多少個呢？用nmatch參數表示。和strncpy類似，這一組
regmatch_t對象應該由我們事先分配好再傳給函數。因此這兩個函數應該是這樣調用的：

C代碼

1. regex_t regobj;
2. regcomp(®obj, "正則表達式", 標志位1|標志位2|...);
3. regmatch_t matchbuf[10];
4. regexec(®obj, "目標字符串", 10, matchbuf, 標志位1|標志位2|...);

regex_t regobj;
regcomp(®obj, "正則表達式", 標志位1|標志位2|...);
regmatch_t matchbuf[10];
regexec(®obj, "目標字符串", 10, matchbuf, 標志位1|標志位2|...);

regmatch_t對象如何表示一個匹配呢？如果一個正則表達式模式在一個目標字符串中有五次
出現，如何表示這五次出現呢？可以猜測這個regmatch_t結構體一定包含了在目標字符串中
的匹配位置信息。另外，我傳進去10個regmatch_t對象，如果只有五次匹配，函數返回后我
怎么知道前面五個對象是有效的匹配信息而后面是無效的呢？是不是通過一個參數或返回值
表示匹配次數的？該函數并沒有額外的參數，而且快速翻看一下man page的RETURN
VALUE節，這個函數返回值是錯誤碼，也不表示匹配次數。那這個函數一定會在后面無效的
regmatch_t對象里填充一個特殊值，這就是推理，這個猜測將會在閱讀后面的文字時證實或
證偽，不管猜得對不對，一定會在后面得到答案。

作者: linux_Ultra 時間: 2009-6-30 09:32
后面還有兩個函數：

C代碼

1. size_t regerror(int errcode, const regex_t *preg, char *errbuf,
2.                size_t errbuf_size);
3.
4. void regfree(regex_t *preg);

   size_t regerror(int errcode, const regex_t *preg, char *errbuf,
                     size_t errbuf_size);

   void regfree(regex_t *preg);

根據以往的經驗regerror相當于perror或者strerror，將錯誤碼翻譯成一個可讀性好的字符
串，regfree相當于free，用來釋放preg。但是preg不是我們自己事先分配的對象么？既然不
是由這一組函數動態分配的，為什么需要用這一組函數來free？由這個問題引出一個新的猜
測，regex_t這種結構體中一定有指針類型的成員，regcomp函數一定是動態分配了一塊內存
然后讓preg中的指針成員指向該內存，所以需要用regfree來釋放一下，后者循著preg參數找
到它的指針成員，然后釋放先前分配的內存。再結合經驗，正則表達式的長短不同，復雜程
度肯定不同，如果用狀態機表示那么需要的狀態數量肯定不同，不可能所有正則表達式的二
進制表示都用sizeof(regex_t)這么大就夠用，必然需要動態分配內存。這種推理和猜測不僅
有助于解決如何使用函數的問題，而且對于這些函數的實現機制也獲得了一些insight，這種
能力對于讀代碼尤其重要。注意，釋放內存的函數雖然是傳入參數的，不傳出任何有意義的
值，但是函數原型中的參數不使用const修飾，因為釋放內存也是一種修改。

剛把SYNOPSIS看完，還沒有看下面的說明，就已經差不多會用這些函數了，靠的是什么？1、
推理 2、經驗 3、對英文縮寫敏感。下面一邊看說明，一邊驗證以上猜測。

C代碼

1. DESCRIPTION
2. POSIX Regex Compiling
3.       regcomp()  is  used to compile a regular expression into a form that is
4.       suitable for subsequent regexec() searches.

DESCRIPTION
POSIX Regex Compiling
   regcomp()  is  used to compile a regular expression into a form that is
   suitable for subsequent regexec() searches.

沒錯，regcomp確實是用來把正則表達式轉換成一種二進制格式以適合subsequent的
regexec()處理。這個subsequent就說明先調用regcomp再調用regexec。理解文檔的時候，表
示概念的文字和表示概念之間關系的文字是最重要的。像man page這種簡潔的文檔中，表示
概念之間關系的文字尤其容易被忽視，因為不像下定義那么明顯，往往一個詞就帶過。作為
練習，請讀者注意后面的文字中有哪些表示概念之間關系的詞。

作者: linux_Ultra 時間: 2009-6-30 09:33
C代碼

1. regcomp() is supplied with preg, a pointer to a pattern buffer  storage
2. area;  regex, a pointer to the null-terminated string and cflags, flags
3. used to determine the type of compilation.
4.
5. All regular expression searching must be done via  a  compiled  pattern
6. buffer,  thus  regexec()  must always be supplied with the address of a
7. regcomp() initialized pattern buffer.

   regcomp() is supplied with preg, a pointer to a pattern buffer  storage
   area;  regex, a pointer to the null-terminated string and cflags, flags
   used to determine the type of compilation.

   All regular expression searching must be done via  a  compiled  pattern
   buffer,  thus  regexec()  must always be supplied with the address of a
   regcomp() initialized pattern buffer.

preg, a pointer to a pattern buffer storage area就說明preg這個對象的空間是需要我
們自己分配的，分配完了再傳一個地址也就是preg給regcomp。man page不會直接說你應該自
己分配了空間再傳給我，這么說也太貳了。但你要自己體會出它真正想傳達給你的信息。

C代碼

1.    cflags may be the bitwise-or of one or more of the following:
2.
3.    REG_EXTENDED
4.          Use POSIX Extended Regular Expression syntax  when  interpreting
5.          regex. If  not  set,  POSIX Basic Regular Expression syntax is
6.          used.
7.
8.    REG_ICASE
9.          Do not differentiate case.  Subsequent regexec() searches  using
  10.          this pattern buffer will be case insensitive.
  11.
  12.    REG_NOSUB
  13.          Support  for  substring  addressing  of matches is not required.
  14.          The nmatch and pmatch parameters to regexec() are ignored if the
  15.          pattern buffer supplied was compiled with this flag set.
  16.
  17.    REG_NEWLINE
  18.          Match-any-character operators don’t match a newline.
  19.
  20.          A  non-matching list ([^...])  not containing a newline does not
  21.          match a newline.
  22.
  23.          Match-beginning-of-line operator (^) matches  the  empty  string
  24.          immediately  after  a newline, regardless of whether eflags, the
  25.          execution flags of regexec(), contains REG_NOTBOL.
  26.
  27.          Match-end-of-line operator ($) matches the empty string  immedi‐
  28.          ately  before  a  newline, regardless of whether eflags contains
  29.          REG_NOTEOL.
  30.
  31. POSIX Regex Matching
  32.    regexec() is used to match a null-terminated string against the precom‐
  33.    piled  pattern  buffer,  preg. nmatch  and pmatch are used to provide
  34.    information regarding the location of any matches.  eflags may  be  the
  35.    bitwise-or  of  one  or  both  of REG_NOTBOL and REG_NOTEOL which cause
  36.    changes in matching behavior described below.
  37.
  38.    REG_NOTBOL
  39.          The match-beginning-of-line operator always fails to match  (but
  40.          see  the  compilation  flag  REG_NEWLINE above) This flag may be
  41.          used when different portions of a string are passed to regexec()
  42.          and the beginning of the string should not be interpreted as the
  43.          beginning of the line.
  44.
  45.    REG_NOTEOL
  46.          The match-end-of-line operator always fails to  match  (but  see
  47.          the compilation flag REG_NEWLINE above)

   cflags may be the bitwise-or of one or more of the following:

   REG_EXTENDED
            Use POSIX Extended Regular Expression syntax  when  interpreting
            regex. If  not  set,  POSIX Basic Regular Expression syntax is
            used.

   REG_ICASE
            Do not differentiate case.  Subsequent regexec() searches  using
            this pattern buffer will be case insensitive.

   REG_NOSUB
            Support  for  substring  addressing  of matches is not required.
            The nmatch and pmatch parameters to regexec() are ignored if the
            pattern buffer supplied was compiled with this flag set.

   REG_NEWLINE
            Match-any-character operators don’t match a newline.

            A  non-matching list ([^...])  not containing a newline does not
            match a newline.

            Match-beginning-of-line operator (^) matches  the  empty  string
            immediately  after  a newline, regardless of whether eflags, the
            execution flags of regexec(), contains REG_NOTBOL.

            Match-end-of-line operator ($) matches the empty string  immedi‐
            ately  before  a  newline, regardless of whether eflags contains
            REG_NOTEOL.

POSIX Regex Matching
   regexec() is used to match a null-terminated string against the precom‐
   piled  pattern  buffer,  preg. nmatch  and pmatch are used to provide
   information regarding the location of any matches.  eflags may  be  the
   bitwise-or  of  one  or  both  of REG_NOTBOL and REG_NOTEOL which cause
   changes in matching behavior described below.

   REG_NOTBOL
            The match-beginning-of-line operator always fails to match  (but
            see  the  compilation  flag  REG_NEWLINE above) This flag may be
            used when different portions of a string are passed to regexec()
            and the beginning of the string should not be interpreted as the
            beginning of the line.

   REG_NOTEOL
            The match-end-of-line operator always fails to  match  (but  see
            the compilation flag REG_NEWLINE above)

前面猜測過了，cflags和eflags既然不叫同一個名字，肯定分別有不同的取值，并且通常這
些取值都是bitwise-or起來用的。本文重點在于講如何閱讀理解man page，而不在于講具體
的技術，所以這些標志都起什么作用不詳細解釋了。但是再做幾個猜縮寫的練習，這不僅有
助于理解，更有助于記憶這些標志，有些常用的標志把它記住了就不必每次用都查手冊了。
REG_ICASE，ICASE表示ignore case，這種縮寫很常見。REG_NOSUB，SUB有些時候表示
substitute，有些時候表示substring，在這里就表示substring。REG_NOTBOL，初看不知道
BOL是什么，看是再看和它對稱的REG_NOTEOL，根據經驗，我們已經知道EOF是end of file，
那么這個EOL應該是end of line，那么相對地BOL就應該是beginning of line。

作者: linux_Ultra 時間: 2009-6-30 09:36
C代碼

1. BYTE OFFSETS
2.    Unless  REG_NOSUB was set for the compilation of the pattern buffer, it
3.    is possible to obtain substring match addressing  information. pmatch
4.    must be dimensioned to have at least nmatch elements.  These are filled
5.    in by regexec() with substring match addresses.  Any  unused  structure
6.    elements will contain the value -1.
7.
8.    The  regmatch_t  structure  which  is  the type of pmatch is defined in
9.    .
  10.
  11.       typedef struct {
  12.          regoff_t rm_so;
  13.          regoff_t rm_eo;
  14.       } regmatch_t;
  15.
  16.    Each rm_so element that is not -1 indicates the  start  offset  of  the
  17.    next  largest  substring  match  within the string.  The relative rm_eo
  18.    element indicates the end offset of the match.

BYTE OFFSETS
   Unless  REG_NOSUB was set for the compilation of the pattern buffer, it
   is possible to obtain substring match addressing  information. pmatch
   must be dimensioned to have at least nmatch elements.  These are filled
   in by regexec() with substring match addresses.  Any  unused  structure
   elements will contain the value -1.

   The  regmatch_t  structure  which  is  the type of pmatch is defined in
   .

         typedef struct {
            regoff_t rm_so;
            regoff_t rm_eo;
         } regmatch_t;

   Each rm_so element that is not -1 indicates the  start  offset  of  the
   next  largest  substring  match  within the string.  The relative rm_eo
   element indicates the end offset of the match.

沒錯，先前我們猜測，regmatch_t對象表示匹配的位置信息，從regexec函數返回后，那組
regmatch_t對象后面無效的部分一定是用一個特殊值來表示無效，這個特殊值就是-1。匹配
位置信息包括起始位置和結束位置，再一猜就知道，rm_so表示regmatch start
offset，rm_eo表示regmatch end offset，要有這樣的敏感性，rm_so和rm_eo，別的字母都
一樣，就s和e不一樣，表示相對概念的s和e就是start和end，這在程序代碼中很常見。還有
一個很常見的現象是結構體成員名字有一個前綴是結構體名字的縮寫，比如這里的rm_表示
regmatch。

C代碼

1. Posix Error Reporting
2.    regerror() is used to turn the error codes that can be returned by both
3.    regcomp() and regexec() into error message strings.
4.
5.    regerror() is passed the error code, errcode, the pattern buffer, preg,
6.    a pointer to a character string buffer, errbuf, and  the  size  of  the
7.    string buffer, errbuf_size.  It returns the size of the errbuf required
8.    to contain the null-terminated error message string. If  both  errbuf
9.    and  errbuf_size  are  nonzero,  errbuf  is  filled  in  with the first
  10.    errbuf_size - 1 characters of the error message and a terminating null.
  11.
  12. POSIX Pattern Buffer Freeing
  13.    Supplying  regfree()  with a precompiled pattern buffer, preg will free
  14.    the memory allocated to the pattern buffer by  the  compiling  process,
  15.    regcomp().

Posix Error Reporting
   regerror() is used to turn the error codes that can be returned by both
   regcomp() and regexec() into error message strings.

   regerror() is passed the error code, errcode, the pattern buffer, preg,
   a pointer to a character string buffer, errbuf, and  the  size  of  the
   string buffer, errbuf_size.  It returns the size of the errbuf required
   to contain the null-terminated error message string. If  both  errbuf
   and  errbuf_size  are  nonzero,  errbuf  is  filled  in  with the first
   errbuf_size - 1 characters of the error message and a terminating null.

POSIX Pattern Buffer Freeing
   Supplying  regfree()  with a precompiled pattern buffer, preg will free
   the memory allocated to the pattern buffer by  the  compiling  process,
   regcomp().

這也和先前猜測的一致。regerror是把錯誤碼翻譯成可讀性好的字符串。regfree是把preg對
象中分配的內存釋放掉。

作者: linux_Ultra 時間: 2009-6-30 09:37
C代碼

1. RETURN VALUE
2.       regcomp()  returns  zero  for a successful compilation or an error code
3.       for failure.
4.
5.       regexec() returns zero for a successful match or REG_NOMATCH for  fail‐
6.       ure.

RETURN VALUE
   regcomp()  returns  zero  for a successful compilation or an error code
   for failure.

   regexec() returns zero for a successful match or REG_NOMATCH for  fail‐
   ure.

man page為了保持形式上的整齊，把RETURN VALUE單獨拿出來湊成一節，這一直讓我覺得很
不舒服。如果在一個man page里描述了多個函數，那么每看完一個函數的說明都應該跳到這
里來看一下返回值是什么，而不是把其它函數的說明全部看完了再看這里。事實上這個man
page做得也不夠整齊，regerror的返回值就寫在上面的說明文字中而沒有寫在這里。可見把
返回值在最后單列出來很不符合書寫和閱讀習慣。現在這樣搞得很不好，有的返回值單列在
后面，有的又寫在說明文字中，看手冊就得滿世界找返回值在哪兒。我認為這是man page的
一大缺點。相反，讓新手很不舒服的是man page太過簡潔，并且沒有代碼例子，這不是man
page的缺點而應該是優點。

C代碼

1. ERRORS
2.       The following errors can be returned by regcomp():
3.
4.       REG_BADBR
5.             Invalid use of back reference operator.
6.
7.       REG_BADPAT
8.             Invalid use of pattern operators such as group or list.
9.
  10.       REG_BADRPT
  11.             Invalid  use  of  repetition  operators such as using ’*’ as the
  12.             first character.
  13.
  14.       REG_EBRACE
  15.             Un-matched brace interval operators.
  16.
  17.       REG_EBRACK
  18.             Un-matched bracket list operators.
  19.
  20.       REG_ECOLLATE
  21.             Invalid collating element.
  22.
  23.       REG_ECTYPE
  24.             Unknown character class name.
  25.
  26.       REG_EEND
  27.             Non specific error.  This is not defined by POSIX.2.
  28.
  29.       REG_EESCAPE
  30.             Trailing backslash.
  31.
  32.       REG_EPAREN
  33.             Un-matched parenthesis group operators.
  34.
  35.       REG_ERANGE
  36.             Invalid use of the range operator, e.g., the ending point of the
  37.             range occurs prior to the starting point.
  38.
  39.       REG_ESIZE
  40.             Compiled  regular  expression  requires  a pattern buffer larger
  41.             than 64Kb.  This is not defined by POSIX.2.
  42.
  43.       REG_ESPACE
  44.             The regex routines ran out of memory.
  45.
  46.       REG_ESUBREG
  47.             Invalid back reference to a subexpression.
  48.
  49. CONFORMING TO
  50.       POSIX.1-2001.

ERRORS
   The following errors can be returned by regcomp():

   REG_BADBR
            Invalid use of back reference operator.

   REG_BADPAT
            Invalid use of pattern operators such as group or list.

   REG_BADRPT
            Invalid  use  of  repetition  operators such as using ’*’ as the
            first character.

   REG_EBRACE
            Un-matched brace interval operators.

   REG_EBRACK
            Un-matched bracket list operators.

   REG_ECOLLATE
            Invalid collating element.

   REG_ECTYPE
            Unknown character class name.

   REG_EEND
            Non specific error.  This is not defined by POSIX.2.

   REG_EESCAPE
            Trailing backslash.

   REG_EPAREN
            Un-matched parenthesis group operators.

   REG_ERANGE
            Invalid use of the range operator, e.g., the ending point of the
            range occurs prior to the starting point.

   REG_ESIZE
            Compiled  regular  expression  requires  a pattern buffer larger
            than 64Kb.  This is not defined by POSIX.2.

   REG_ESPACE
            The regex routines ran out of memory.

   REG_ESUBREG
            Invalid back reference to a subexpression.

CONFORMING TO
   POSIX.1-2001.

有個學生看完了這一段之后問我，上面說regexec成功返回0失敗返回
REG_NOMATCH，REG_NOMATCH這個錯誤碼表示什么？怎么在ERRORS節中沒有解釋？這是一個典
型的沒有理解到位的例子。上面說regcomp成功返回0失敗返回錯誤碼，卻沒有說返回哪些錯
誤碼，而是詳細列在ERRORS節中，regcomp失敗的原因有很多，這些錯誤碼大多是描述正則表
達式的各種語法錯誤的。而regexec是判斷匹配不匹配的，匹配就返回0不匹配就返回
REG_NOMATCH，NOMATCH就是no match，這句話本身就說明了這個錯誤碼是什么意思，所以就
沒有在ERRORS節中再解釋了，這也體現了man page的簡潔性，一句廢話都沒有。

這個學生為什么會沒有理解到位呢？還是因為對英文不敏感，REG_NOMATCH在他看來就是一串
大寫字母，一個符號，而沒看出來是no match，因此覺得這個符號必須在后面詳細解釋，而
沒有想到這個符號用在這里是雙關的，它自己就解釋了自己。

C代碼

1. SEE ALSO
2.       grep(1), regex(7), GNU regex manual
3.
4. COLOPHON
5.       This page is part of release 2.77 of the Linux  man-pages  project. A
6.       description  of  the project, and information about reporting bugs, can
7.       be found at http://www.kernel.org/doc/man-pages/.
8.
9. GNU                            1998-05-08                         REGEX(3)

SEE ALSO
   grep(1), regex(7), GNU regex manual

COLOPHON
   This page is part of release 2.77 of the Linux  man-pages  project. A
   description  of  the project, and information about reporting bugs, can
   be found at http://www.kernel.org/doc/man-pages/.

GNU                            1998-05-08                         REGEX(3)

man page的最后這一段比較有價值的是SEE ALSO。由于每個man page都有自己的主題，而不
會去扯一些離題的話，有時候就需要把幾個相關的man page結合起來看，從一系列的相關主
題中把握一個overview。有的man page有BUGS節，這也是非常重要的，最典型的是gets(3)，
前面描述了半天這個函數是干嗎用的，最后在BUGS節里面說，Never use gets()，如
果沒看見這一句，前面的都白看。

作者: 宇宙飛船 時間: 2009-6-30 09:43
俺等會也搞點英語閱讀材料上來，也是關于GNU工具的，這些都是電工們吃飯的家當。

作者: qupeng2008 時間: 2009-6-30 09:51
幫頂啊~雖然俺看不懂~O(∩_∩)O~

作者: linux_Ultra 時間: 2009-6-30 09:52
寫帖子的人還 GNU Free Documentation License發布的 linux 編程書，
雖然有一定的商業目的，但是還可以看看的，
http://djkings.javaeye.com/blog/218542

歡迎光臨電子工程網 (http://m.qingdxww.cn/)