| MANDOC(3) | Library Functions Manual | MANDOC(3) | 
mandoc, deroff,
  mparse_alloc, mparse_copy,
  mparse_free, mparse_open,
  mparse_readfd, mparse_reset,
  mparse_result —
#include <sys/types.h>
#include <stdio.h>
#include <mandoc.h>
  
  #define ASCII_NBRSP
  
  #define ASCII_HYPH
  
  #define ASCII_BREAK
struct mparse *
  
  mparse_alloc(int options,
    enum mandoc_os oe_e, char
    *os_s);
void
  
  mparse_free(struct mparse
    *parse);
void
  
  mparse_copy(const struct mparse
    *parse);
int
  
  mparse_open(struct mparse
    *parse, const char *fname);
void
  
  mparse_readfd(struct mparse
    *parse, int fd, const char
    *fname);
void
  
  mparse_reset(struct mparse
    *parse);
struct roff_meta *
  
  mparse_result(struct mparse
    *parse);
#include
  <roff.h>
void
  
  deroff(char **dest,
    const struct roff_node *node);
#include
    <sys/types.h>
  
  #include <mandoc.h>
  
  #include <mdoc.h>
extern const char * const * mdoc_argnames;
  
  extern const char * const * mdoc_macronames;
#include
    <sys/types.h>
  
  #include <mandoc.h>
  
  #include <man.h>
extern const char * const * man_macronames;
mandoc library parses a UNIX
  manual into an abstract syntax tree (AST). UNIX
  manuals are composed of mdoc(7) or
  man(7), and may be mixed with
  roff(7),
  tbl(7), and
  eqn(7) invocations.
The following describes a general parse sequence:
mparse_alloc();mparse_open();mparse_readfd();mparse_result();mparse_updaterc();mparse_free() and
      mchars_free(3), or
      invoke mparse_reset() and go back to step 2 to
      parse new files.<mandoc.h>, with the exception
  of those documented in
  mandoc_escape(3) and
  mchars_alloc(3).
mparse_alloc() and freed with
      mparse_free(). This may be used across parsed
      input if mparse_reset() is called between
    parses.deroff()deroff() can be passed to
      free(3).mparse_alloc()MPARSE_MDOC or
          MPARSE_MAN bit is set, only that parser is
          used. Otherwise, the document type is automatically detected.
        When the MPARSE_SO bit is set,
            roff(7)
            so file inclusion requests are always
            honoured. Otherwise, if the request is the only content in an input
            file, only the file name is remembered, to be returned in the
            sodest field of struct
            roff_meta.
When the MPARSE_QUICK bit is set,
            parsing is aborted after the NAME section. This is for example
            useful in
            makewhatis(8)
            -Q to quickly build minimal databases.
When the MARSE_VALIDATE bit is
            set, mparse_result() runs the validation
            functions before returning the syntax tree. This is almost always
            required, except in certain debugging scenarios, for example to dump
            unvalidated syntax trees.
MANDOC_OS_OTHER, the system is automatically
          detected from Os,
          -Ios, or
          uname(3).Os macro, overriding the
          OSNAME preprocessor definition and the results
          of uname(3). Passing
          NULL sets no default.The same parser may be used for multiple files so long as
        mparse_reset() is called between parses.
        mparse_free() must be called to free the memory
        allocated by this function. Declared in
        <mandoc.h>, implemented
        in read.c.
mparse_free()mparse_alloc().
      Declared in <mandoc.h>,
      implemented in read.c.mparse_copy()-man
      -Tman. Declared in
      <mandoc.h>, implemented in
      read.c.mparse_open().gz’, try
      again after appending ‘.gz’. Save
      the information whether the file is zipped or not. Return a file
      descriptor open for reading or -1 on failure. It can be passed to
      mparse_readfd() or used directly. Declared in
      <mandoc.h>, implemented in
      read.c.mparse_readfd()mparse_open(). Pass the associated filename in
      fname. This function may be called multiple times
      with different parameters; however,
      close(2) and
      mparse_reset() should be invoked between parses.
      Declared in <mandoc.h>,
      implemented in read.c.mparse_reset()mparse_readfd() may be used
      again. Declared in
      <mandoc.h>, implemented in
      read.c.mparse_result()<mandoc.h>, implemented in
      read.c.The following non-printing characters may be embedded in text strings:
ASCII_NBRSPASCII_HYPHASCII_BREAKEscape characters are also passed verbatim into text strings. An escape character is a sequence of characters beginning with the backslash (‘\’). To construct human-readable text, these should be intercepted with mandoc_escape(3) and converted with one the functions described in mchars_alloc(3).
The AST is composed of struct roff_node nodes with element, root and text types as declared by the type field. Each node also provides its parse point (the line, pos, and sec fields), its position in the tree (the parent, child, next and prev fields) and some type-specific data.
The tree itself is arranged according to the following normal form, where capitalised non-terminals represent nodes.
The only elements capable of nesting other elements are those with next-line scope as documented in man(7).
The AST is composed of struct roff_node nodes with block, head, body, element, root and text types as declared by the type field. Each node also provides its parse point (the line, pos, and sec fields), its position in the tree (the parent, child, last, next and prev fields) and some type-specific data, in particular, for nodes generated from macros, the generating macro in the tok field.
The tree itself is arranged according to the following normal form, where capitalised non-terminals represent nodes.
Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of the BLOCK production: these refer to punctuation marks. Furthermore, although a TEXT node will generally have a non-zero-length string, in the specific case of ‘.Bd -literal’, an empty line will produce a zero-length string. Multiple body parts are only found in invocations of ‘Bl -column’, where a new body introduces a new phrase.
The mdoc(7) syntax tree accommodates for broken block structures as well. The ENDBODY node is available to end the formatting associated with a given block before the physical end of that block. It has a non-null end field, is of the BODY type, has the same tok as the BLOCK it is ending, and has a pending field pointing to that BLOCK's BODY node. It is an indirect child of that BODY node and has no children of its own.
An ENDBODY node is generated when a block ends while one of its child blocks is still open, like in the following example:
.Ao ao .Bo bo ac .Ac bc .Bc end
This example results in the following block structure:
BLOCK Ao
    HEAD Ao
    BODY Ao
        TEXT ao
        BLOCK Bo, pending -> Ao
            HEAD Bo
            BODY Bo
                TEXT bo
                TEXT ac
                ENDBODY Ao, pending -> Ao
                TEXT bc
TEXT end
Here, the formatting of the Ao block
    extends from TEXT ao to TEXT ac, while the formatting of the
    Bo block extends from TEXT bo to TEXT bc. It renders
    as follows in -Tascii
  mode:
<ao [bo ac> bc]
  endSupport for badly-nested blocks is only provided for backward
    compatibility with some older
    mdoc(7) implementations. Using
    badly-nested blocks is strongly discouraged; for example,
    the -Thtml front-end to
    mandoc(1) is unable to render
    them in any meaningful way. Furthermore, behaviour when encountering
    badly-nested blocks is not consistent across troff implementations,
    especially when using multiple levels of badly-nested blocks.
mandoc library was written by
  Kristaps Dzonsons
  <kristaps@bsd.lv> and is
  maintained by Ingo Schwarze
  <schwarze@openbsd.org>.
| December 30, 2018 | NetBSD 9.3 |