<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<!-- <rfc xmlns:xi="http://www.w3.org/2001/XInclude" ipr="trust200902" category="info" submissionType="IETF" tocInclude="true" sortRefs="true" symRefs="true" version="3"> -->
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" ipr="trust200902" docName="draft-sakemi-areion-01" category="info" submissionType="IETF" tocInclude="true" sortRefs="true" symRefs="true" version="3">

  <front>
    <title abbrev="Areion">Areion: Highly-Efficient Permutations and Its Applications</title>
    <seriesInfo name="Internet-Draft" value="draft-sakemi-areion-01"/>
    <author fullname="Yumi Sakemi" initials="Y." surname="Sakemi" role="editor">
      <organization>GMO CONNECT Inc.</organization>
      <address>
        <email>sakemi-yumi@gmo-connect.jp</email>
      </address>
    </author>
    <author initials="S." surname="Kanno" fullname="Satoru Kanno">
      <organization>GMO CONNECT Inc.</organization>
      <address>
        <email>kanno@gmo-connect.jp</email>
      </address>
    </author>
    <author fullname="Takanori Isobe" initials="T." surname="Isobe">
      <organization>The University of Osaka</organization>
      <address>
        <email>takanori.isobe@ist.osaka-u.ac.jp</email>
      </address>
    </author>
    <date year="2026" month="February" day="16"/>
    <keyword>Low Latency, AES instruction, Authenticated encryption</keyword>
    <abstract>
      <t>This document specifies a series of cryptographic wide-block permutations referred to as Areion-256 and Areion-512. These permutations are constructed using AES round operations and are designed for ultra-low latency implementations on modern processors with AES instructions. The Areion permutations can be used as building blocks in various cryptographic constructions, including authenticated encryption and hashing of relatively short input data. Additionally, it describes AEAD schemes and hash functions constructed from Areion.</t>
    </abstract>
  </front>
  <middle>
    <section anchor="Intro">
      <name>Introduction</name>
      <t>The recent evolution of communication technologies demands unprecedented performance in data processing, especially in networking and secure data transmission. To address these demands, cryptographic primitives must be designed to minimize latency while maintaining strong security guarantees. In particular, data-centric architectures such as Content Delivery Networks (CDNs), IoT devices, and 6G networks require cryptographic primitives that can offer both robust security and high efficiency. Wide-block ciphers, which operate on larger block sizes than traditional block ciphers, have emerged as a promising solution to address the limitations of 128-bit block sizes, such as those in AES-GCM. Notably, concerns regarding the birthday bound security limitation of 64-bit data security in AES-GCM have driven research into cryptographic primitives that can provide beyond-birthday-bound security while maintaining competitive performance.</t>
      <t>As shown in <xref target="SP800-38A"/>, modes of operation for block ciphers often have inherent limitations in terms of security bounds due to the fixed block size. For instance, AES-GCM provides a data security bound limited to approximately 2^64 encrypted blocks when using a 128-bit block cipher like AES. This limitation can be problematic in high-throughput applications where large amounts of data are encrypted under a single key. Therefore, there is a growing need for cryptographic schemes that can offer higher data security bounds, such as 2^128, while also being efficient in practice. Moreover, recent discussions, such as those in the public comments on NIST SP800-38A <xref target="PublicCommentOnSP800-38A"/>, have emphasized the need for re-evaluating block cipher modes and considering alternatives that can provide stronger security guarantees and better performance characteristics for modern applications.</t>
      <t>This need for short-message optimization is driven by real-world data, as detailed in <xref target="Areion"/>. In communication environments, the overwhelming majority of cases require encryption or hashing of short inputs (up to 2K bytes). For example, 44% of "real-world" TCP/IP packets are between 40 and 100 bytes long. Furthermore, the maximum packet lengths for protocols like Zigbee (127 bytes) and Bluetooth Low Energy (47 bytes) are less than 128 bytes. Existing schemes are often not optimized for these common short inputs.</t>
      <t>Areion is a novel cryptographic primitive designed to meet these demands by providing highly efficient wide-block permutations based on AES round operations. By using modern CPU instructions for AES and SIMD (Single Instruction Multiple Data) operations, Areion achieves ultra-low latency encryption and decryption while maintaining strong security guarantees. The Areion permutations, Areion-256 and Areion-512, operate on 256-bit and 512-bit blocks respectively, providing a flexible and scalable foundation for constructing secure cryptographic schemes.</t>
      <t>This document focuses on the design and specification of the Areion permutation algorithm, detailing the construction of Areion-256 and Areion-512. The Areion permutations are designed to be efficient when implemented using AES instructions and SIMD capabilities on modern processors, making them suitable for high-performance cryptographic applications. The design leverages AES round operations, including SubBytes (SB), ShiftRows (SR), MixColumns (MC), and AddRoundConstant (AC), to construct a secure and efficient permutation suitable for various cryptographic constructions.</t>
      <t>The design and analysis of Areion have been studied in detail in <xref target="Areion"/>; this document provides an implementation-oriented specification suitable for Internet deployment.</t>
      <t>Areion's design is deeply influenced by the AES instruction set and modern processor architectures, allowing it to achieve high throughput and low latency. By carefully selecting the number of rounds, round constants, and the structure of the permutation, Areion aims to provide a secure and efficient permutation that can serve as a building block for authenticated encryption, hashing, and other cryptographic primitives. This document specifies the full algorithmic details necessary to implement the Areion permutations and their use in an AEAD scheme based on the OPP (Offset Public Permutation) mode.</t>
      <t>In this document, we specify a detailed specification of permutation on Areion.</t>
      <t>Note: While Areion can be applied for hashing, this document focuses on the permutation, its use in authenticated encryption, and hash functions built from the permutation.</t>
      <section anchor="Conv">
        <name>Conventions Used in This Document</name>
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they appear in all capitals, as shown here.</t>
      </section>
    </section>

    <section anchor="design-rationale">
      <name>Design Rationale</name>
      <t>Areion's design is based on several key principles derived from the extended analysis in <xref target="Areion"/>, balancing high performance on modern CPUs with robust security margins.</t>
      <section anchor="rationale-aes-ni">
        <name>Maximal Use of AES Instructions</name>
        <t>Areion is designed to be implemented solely using AES instructions, such as <tt>aesenc</tt> in Intel AES-NI or <tt>vaeseq</tt> and <tt>vaesmcq</tt> in ARMv8 NEON. These instructions are among the most efficient and cryptographically strong operations available in modern SIMD instruction sets. This approach avoids slower shuffle operations and leverages the deep security analysis of the AES round function.</t>
      </section>
      <section anchor="rationale-pipeline">
        <name>Pipeline-Friendly Feistel Structure</name>
        <t>Modern processors can execute multiple AES instructions in parallel through pipelining (e.g., Intel Ice Lake can pipeline up to 6 <tt>aesenc</tt> instructions). Areion adopts a "pipeline-friendly" Feistel-type structure, which, unlike traditional Feistel schemes or Simpira v2, adds F-functions in a way that takes full advantage of this hardware parallelism. This structure allows for more AES instructions to be executed in parallel within a single round, significantly reducing latency.</t>
      </section>
      <section anchor="rationale-search">
        <name>Systematic Search for Optimal Functions</name>
        <t>The specific F-functions used in Areion-256 (F_1, F_2) and Areion-512 (F_0, F_1, F_3) were not chosen arbitrarily. They are the result of a systematic search over all possible combinations of 1- and 2-round AES operations. The chosen structures, (2, 1)-perm for Areion-256 and (0, 1, 0, 3, pi_1)-perm for Areion-512, were identified as providing the best trade-off between the lowest number of AES instructions required and the highest security level achieved against differential, linear, impossible differential, and integral attacks.</t>
      </section>
      <section anchor="rationale-rounds">
        <name>Round Counts and Security Margins</name>
        <t>The number of rounds, 10 for Areion-256 and 15 for Areion-512, was determined by detailed security analysis. This analysis (detailed in <xref target="security-considerations"/>) identified the longest possible attacks (e.g., 5-round zero-sum for Areion-256, 10-round zero-sum for Areion-512) and established the full round counts to provide a sufficient security margin against all known cryptanalytic techniques.</t>
      </section>
    </section>

    <section anchor="design-of-areion-permutation">
      <name>Design of Areion Permutation</name>
      <t>The Areion permutation algorithm is designed to provide ultra-low latency cryptographic operations while maintaining strong security properties. The design leverages AES round operations and modern CPU instruction sets to construct efficient 256-bit and 512-bit permutations. These permutations serve as the core components for various cryptographic modes, including authenticated encryption and hashing. This section outlines the notations and the structure of the Areion permutation algorithm for both Areion-256 and Areion-512.</t>
      <t>The round functions for Areion-256 and Areion-512 are illustrated in Figure 1.</t>
      <figure anchor="fig-round-functions">
        <name>The round functions of Areion <xref target="Areion"/></name>
        <artwork><![CDATA[
  x0_r   x1_r              x0_r   x1_r   x2_r   x3_r
   |      |                 |      |      |      |
   |---.  |                 |---.  |      |---.  |
   |   |  |                 |   |  |      |   |  |
   |   v  |                 |   v  |      |   v  |
   |  [F2] |                 |  [F0] |      |  [F0] |
   |   |  |                 |   |  |      |   |  |
   |   '--+-->(+)            |   '--+-->(+) |   '--+-->(+)
   |      |                 |      |      |      |
   v      |                 v      |      v      |
  [F1]    |                [F1]    |     [F3]    |
   |      |                 |      |      |      |
   '------'                 '------'      '------'
     | |                      | |            | |
     | |                      | '----.  .----' |
     X                        |      |  |      |
    / \                       |      |  |      |
   /   \                      v      v  v      v
  v     v                   x0_r+1 x1_r+1 x2_r+1 x3_r+1
x0_r+1 x1_r+1

(a) Areion-256             (b) Areion-512
]]></artwork>
      </figure>

      <section anchor="notations">
        <name>Notations</name>
        <dl>
          <dt>SB:</dt>
          <dd>
            <t>SubBytes</t>
          </dd>
          <dt>SR:</dt>
          <dd>
            <t>ShiftRows</t>
          </dd>
          <dt>MC:</dt>
          <dd>
            <t>MixColumns</t>
          </dd>
          <dt>AC:</dt>
          <dd>
            <t>AddRoundConstant operations of the AES round function. This operation adds a round constant to the state, similar to the AddRoundKey operation in AES, but instead of a round key, a constant is added.</t>
          </dd>
          <dt><tt>^</tt>:</dt>
          <dd>
            <t>Bitwise XOR operation</t>
          </dd>
          <dt><tt>◦</tt>:</dt>
          <dd>
            <t>Function composition, where the function on the right is applied first</t>
          </dd>
        </dl>
      </section>
      <section anchor="functions">
        <name>Functions</name>
        <t>Based on the operations in the AES round function, we define four 128-bit functions F_i for i in {0, 1, 2, 3}. Each function maps a 128-bit state to a 128-bit state. The AES round operations SubBytes, ShiftRows, MixColumns, and AddRoundConstant are denoted by SB, SR, MC, and AC, respectively. AC(x, C) denotes the bitwise XOR of a 128-bit constant C to the 128-bit state x.</t>
        <ul spacing="normal">
          <li>
            <t>F_0(x) = MC ◦ SR ◦ SB(x)</t>
          </li>
          <li>
            <t>F_1(x) = SR ◦ SB(x)</t>
          </li>
          <li>
            <t>For a given round index r, F_2^{(r)}(x) = MC ◦ SR ◦ SB ◦ AC( MC ◦ SR ◦ SB(x), RC_r )</t>
          </li>
          <li>
            <t>For a given round index r, F_3^{(r)}(x) = MC ◦ SR ◦ SB ◦ AC( SR ◦ SB(x), RC_r )</t>
          </li>
        </ul>
        <t>The constants RC_r used in AC(·, RC_r) are the round constants defined in <xref target="round-constants"/>. F_0 and F_1 do not use round constants.</t>
      </section>
      <section anchor="round-constants">
        <name>Round Constants</name>
        <t>Areion uses 128-bit round constants RC_r for r = 0, 1, ..., 14. The constants are derived from the binary expansion of the fractional part of π and are given in hexadecimal notation in Table&nbsp;1. Each constant is used in little-endian byte order when applied by AC(x, RC_r).</t>
        <table anchor="rc-table">
          <name>Round constants RC_r</name>
          <thead>
            <tr>
              <th>r</th>
              <th>RC_r (hexadecimal)</th>
            </tr>
          </thead>
          <tbody>
            <tr><td>0</td><td>0x243f6a8885a308d313198a2e03707344</td></tr>
            <tr><td>1</td><td>0xa4093822299f31d0082efa98ec4e6c89</td></tr>
            <tr><td>2</td><td>0x452821e638d01377be5466cf34e90c6c</td></tr>
            <tr><td>3</td><td>0xc0ac29b7c97c50dd3f84d5b5b5470917</td></tr>
            <tr><td>4</td><td>0x9216d5d98979fb1bd1310ba698dfb5ac</td></tr>
            <tr><td>5</td><td>0x2ffd72dbd01adfb7b8e1afed6a267e96</td></tr>
            <tr><td>6</td><td>0xba7c9045f12c7f9924a19947b3916cf7</td></tr>
            <tr><td>7</td><td>0x801f2e2858efc16636920d871574e690</td></tr>
            <tr><td>8</td><td>0xa458fea3f4933d7e0d95748f728eb658</td></tr>
            <tr><td>9</td><td>0x718bcd5882154aee7b54a41dc25a59b5</td></tr>
            <tr><td>10</td><td>0x9c30d5392af26013c5d1b023286085f0</td></tr>
            <tr><td>11</td><td>0xca417918b8db38ef8e79dcb0603a180e</td></tr>
            <tr><td>12</td><td>0x6c9e0e8bb01e8a3ed71577c1bd314b27</td></tr>
            <tr><td>13</td><td>0x78af2fda55605c60e65525f3aa55ab94</td></tr>
            <tr><td>14</td><td>0x5748986263e8144055ca396a2aab10b6</td></tr>
          </tbody>
        </table>
        <t>In round r of Areion-256 (0 ≤ r ≤ 9) and Areion-512 (0 ≤ r ≤ 14), the constant RC_r is added to exactly one 128-bit word via AC(x, RC_r) inside F_2^{(r)} or F_3^{(r)}, respectively.</t>
      </section>
      
      <section anchor="areion-256-permutation">
        <name>Areion-256 Permutation</name>
        <dl>
          <dt>Input:</dt>
          <dd>
            <t>A 256-bit block divided into two 128-bit halves: <tt>x0</tt> and <tt>x1</tt>.</t>
          </dd>
        </dl>
        <t>Procedures:</t>
        <t>Let <tt>(x0, x1)</tt> be the two 128-bit words of the input.</t>
        <t>The Areion-256 permutation processes 10 rounds organized as 5 pairs. Each pair of rounds swaps the parameter positions to enable efficient pipelining:</t>
        <artwork><![CDATA[
          For i from 0 to 9 step 2:
            Round i:   (x0, x1) := RoundFunction256(x0, x1, i)
            Round i+1: (x1, x0) := RoundFunction256(x1, x0, i+1)

          Where RoundFunction256(a, b, r) is defined as:
            rc0 = RC_r (round constant from Table 1)
            rc1 = 0^128 (all-zero 128-bit value)
            b := aesenc(aesenc(a, rc0), b)
            a := aesenclast(a, rc1)
            return (a, b)

          <t>Here, aesenc(state, key) performs SubBytes, ShiftRows, MixColumns, then XOR with key (corresponds to _mm_aesenc_si128 in Intel AES-NI or equivalent), and aesenclast(state, key) performs SubBytes, ShiftRows, then XOR with key without MixColumns (corresponds to _mm_aesenclast_si128).
        ]]></artwork>

        <dl>
          <dt>Output:</dt>
          <dd>
            <t>Concatenation of <tt>x0</tt> and <tt>x1</tt> after all rounds.</t>
          </dd>
        </dl>
        <t><strong>Implementation Note:</strong> The parameter swapping between round pairs <tt>(x0,x1) → (x1,x0)</tt> is critical for correct operation and matches the official reference implementation.</t>
      </section>
      <section anchor="areion-256-inverse-permutation">
        <name>Inverse Areion-256 Permutation</name>
        <t>The inverse permutation, denoted <tt>Areion-256-Inverse</tt>, reverses the steps of the forward permutation. It takes a 256-bit block <tt>(x0, x1)</tt> as input and returns the original block.</t>
        <t>The decryption process iterates from round <tt>r = 9</tt> down to <tt>0</tt>, processing two rounds at a time to mirror the forward pair structure.</t>
        <artwork><![CDATA[
          // Initial state: (x0, x1) from forward output
          
          For i from 8 down to 0 step 2:
            // Inverse Round i+1 (Odd round: Input was (x1, x0))
            // In forward round i+1: x1 = F_1(x1), x0 = x0 XOR F_2(x1, i+1)
            // Inverse operations:
            x1 = F_1_Inverse(x1)
            x0 = x0 XOR F_2(x1, i+1)

            // Inverse Round i (Even round: Input was (x0, x1))
            // In forward round i: x0 = F_1(x0), x1 = x1 XOR F_2(x0, i)
            // Inverse operations:
            x0 = F_1_Inverse(x0)
            x1 = x1 XOR F_2(x0, i)

          return (x0, x1)

          Where F_1_Inverse(x) is defined as:
            return aesdeclast(x, 0^128)
            
          And F_2(x, i) is the same forward function used in encryption.
        ]]></artwork>
        <t>Note: <tt>aesdeclast(state, key)</tt> performs InverseShiftRows, InverseSubBytes, then XOR with key. This corresponds exactly to the inverse of <tt>aesenclast</tt> with a zero key.</t>
      </section>

      <section anchor="areion-512-permutation">
        <name>Areion-512 Permutation</name>
        <dl>
          <dt>Input:</dt>
          <dd>
            <t>A 512-bit block divided into four 128-bit quarters: <tt>A</tt>, <tt>B</tt>, <tt>C</tt>, and <tt>D</tt>.</t>
          </dd>
        </dl>
        <t>Procedures:</t>
        <artwork><![CDATA[
Let (A, B, C, D) be the four 128-bit words of the input.

For each round r from 0 to 14:
For each round r from 0 to 14:
    1. x1 = x1 XOR F_0(x0)
    2. x3 = x3 XOR F_0(x2)
    3. x0 = F_1(x0)
    4. x2 = F_3^{(r)}(x2)
    
    5. Shuffle (x0, x1, x2, x3) -> (x1, x2, x3, x0) (Left Rotate).

Output is the concatenation of x0, x1, x2, and x3.
]]></artwork>
        <dl>
          <dt>Output:</dt>
          <dd>
            <t>Concatenation of <tt>A</tt>, <tt>B</tt>, <tt>C</tt>, and <tt>D</tt>.</t>
          </dd>
        </dl>
      </section>
    </section>

    <section anchor="hash-functions">
      <name>Hash Functions Based on Areion</name>
      <t>This section specifies hash functions built from the Areion permutations. Two short fixed-input hash functions, Areion256-DM and Areion512-DM, and a variable-input hash function, Areion512-MD, are defined. All constructions are based on the Davies-Meyer compression function and the Merkle-Damgård paradigm, as described in <xref target="Areion"/>.</t>

      <section>
        <name>Short Fixed-Input Hash Functions</name>
        <t>Areion256-DM and Areion512-DM are short fixed-input hash functions built from Areion-256 and Areion-512, respectively. They are intended for hashing a single 256-bit or 512-bit input block, for example short keys, nonces, or identifiers. These constructions do not define padding; callers MUST supply inputs of the required bit length.</t>

        <section>
          <name>Areion256-DM</name>
          <dl>
            <dt>Input:</dt>
            <dd>
              <t>A 256-bit string <tt>X</tt>.</t>
            </dd>
            <dt>Output:</dt>
            <dd>
              <t>A 256-bit hash value <tt>H</tt>.</t>
            </dd>
          </dl>
          <t>The input <tt>X</tt> is mapped to the two 128-bit words <tt>(L, R)</tt> used by Areion-256 as specified in <xref target="areion-256-permutation"/>.</t>
          <artwork><![CDATA[
Areion256-DM(X):
    // X is a 256-bit block mapped to (L, R)
    Y = Areion256(L || R)
    H = Y XOR X
    return H
]]></artwork>
          <t>Here, <tt>Areion256</tt> denotes the Areion-256 permutation, and all XOR operations are taken bitwise on 256-bit strings.</t>
        </section>

        <section anchor="areion512-dm">
          <name>Areion512-DM</name>
          <dl>
            <dt>Input:</dt>
            <dd>
              <t>A 512-bit string <tt>X</tt>.</t>
            </dd>
            <dt>Output:</dt>
            <dd>
              <t>A 256-bit hash value <tt>H</tt>.</t>
            </dd>
          </dl>
          <t>Let <tt>Y = Areion512(X) XOR X</tt>, where <tt>Areion512</tt> is the Areion-512 permutation defined in <xref target="areion-512-permutation"/>. Interpret <tt>Y</tt> as a sequence of sixteen 32-bit words <tt>Y = y_0 || y_1 || ... || y_15</tt>, where <tt>y_0</tt> is the most significant 32 bits. Then the output <tt>H</tt> is obtained by the truncation function:</t>
          <artwork><![CDATA[
H = y_2 || y_3 || y_6 || y_7 || y_8 || y_9 || y_12 || y_13
]]></artwork>
          <t>This truncation matches the definition used in the Areion design paper <xref target="Areion"/> for 512-bit permutations instantiated in Davies-Meyer mode.</t>
        </section>
      </section>

      <section anchor="vil-hash-areion512-md">
        <name>Variable-Input Hash Function Areion512-MD</name>
        <t>Areion512-MD is a variable-input-length hash function built from the Areion-512 permutation using a Davies-Meyer compression function in the Merkle-Damgård framework. It outputs a 256-bit message digest and targets a 256-bit preimage security level, as in <xref target="Areion"/>.</t>

        <section>
          <name>Padding and Message Parsing</name>
          <t>Areion512-MD uses a padding method analogous to that of SHA-256, adapted to a 256-bit message block size.</t>
          <t>Given an input message <tt>M</tt> of arbitrary length <tt>len(M)</tt> bits, the padded message is computed as follows:</t>
          <ul spacing="normal">
            <li><t>Append a single bit <tt>1</tt> to the message.</t></li>
            <li><t>Append <tt>k</tt> zero bits, where <tt>k</tt> is the smallest non-negative integer such that <tt>len(M) + 1 + k + 64</tt> is a multiple of 256.</t></li>
            <li><t>Append a 64-bit big-endian representation of the original length <tt>len(M)</tt>.</t></li>
          </ul>
          <t>The resulting padded message has a bit length that is a multiple of 256. It is then parsed into <tt>t</tt> 256-bit message blocks <tt>M_0, M_1, ..., M_{t-1}</tt>; each block is treated as a 256-bit string.</t>
        </section>

        <section>
          <name>Initial Value</name>
          <t>Areion512-MD uses a 256-bit initial hash value <tt>H^{(0)}</tt> consisting of two 128-bit words <tt>H_0</tt> and <tt>H_1</tt>. These constants are identical to the initial value of SHA-256, grouped into 128-bit words:</t>
          <artwork><![CDATA[
H0 = 0x6a09e667bb67ae853c6ef372a54ff53a
H1 = 0x510e527f9b05688c1f83d9ab5be0cd19
]]></artwork>
          <t>The pair <tt>(H_0, H_1)</tt> is used as the initial chaining value <tt>H^{(0)}</tt>.</t>
        </section>

        <section>
          <name>Compression Function</name>
          <t>The compression function of Areion512-MD instantiates the Davies-Meyer construction with the Areion-512 permutation as follows. Each 256-bit message block <tt>M_i</tt> and the current 256-bit chaining value <tt>H^{(i)}</tt> are combined into a 512-bit input to Areion-512.</t>
          <dl>
            <dt>Input to compression for block <tt>i</tt>:</dt>
            <dd>
              <t>The pair <tt>(M_i, H^{(i)})</tt>, where both are 256-bit strings.</t>
            </dd>
            <dt>Output of compression for block <tt>i</tt>:</dt>
            <dd>
              <t>The next chaining value <tt>H^{(i+1)}</tt>, a 256-bit string.</t>
            </dd>
          </dl>
          <t>The 512-bit input to Areion-512 is formed by concatenating <tt>M_i</tt> and <tt>H^{(i)}</tt> and mapping them to the four 128-bit words <tt>(A, B, C, D)</tt> of Areion-512 as follows:</t>
          <artwork><![CDATA[
A || B = M_i
C || D = H^(i)
]]></artwork>
          <t>The compression function updates the chaining value as follows:</t>
        <artwork><![CDATA[
    H^(i+1) = Areion512-DM(A || B || C || D)
            = Areion512-DM(M_i || H^(i))
  ]]></artwork>
        <t>where Areion512-DM is the function defined in <xref target="areion512-dm"/>.</t>
          <t>All concatenations above are in big-endian bit order, and XOR is taken bitwise on 256-bit strings.</t>
        </section>

        <section>
          <name>Finalization and Output</name>
          <t>After processing the last message block <tt>M_{t-1}</tt>, the final chaining value <tt>H^{(t)}</tt> is returned as the 256-bit hash value:</t>
          <artwork><![CDATA[
Areion512-MD(M) = H^(t)
]]></artwork>
          <t>No additional finalization transformation is applied beyond what is specified above.</t>
        </section>
      </section>
    </section>

    <section anchor="permutation-based-aead-with-areion">
      <name>Permutation-based AEAD with Areion</name>
      <t>This section describes authenticated encryption schemes based on the Areion permutations. In particular, we focus on a variant of OPP mode instantiated with Areion-256, referred to as Areion256-OPP.</t>
      <t>This section specifies the parameters of an AEAD scheme instantiated with Areion-256 using the Offset Public Permutation (OPP) mode. We refer to this instantiation as Areion256-OPP. The OPP mode itself is specified in <xref target="OPP-eprint"/>; this document fixes the permutation, masking function, and key/nonce formatting for the Areion-based instantiation.</t>
      
      <section anchor="OPP-parameters">
        <name>Parameters</name>
        <t>The parameters for Areion256-OPP are defined as follows:</t>
        <ul spacing="normal">
          <li><t>Underlying permutation: Areion-256 as specified in <xref target="areion-256-permutation"/>.</t></li>
          <li><t>Block size <tt>b</tt>: 256 bits.</t></li>
          <li><t>Word size <tt>w</tt>: 64 bits.</t></li>
          <li><t>Number of words in LFSR state <tt>n</tt>: 4 (the masking state is (x0, x1, x2, x3)).</t></li>
          <li><t>Recommended key sizes: 128 or 256 bits.</t></li>
          <li><t>Nonce size: 128 bits.</t></li>
          <li><t>Tag length: 256 bits.</t></li>
        </ul>
      </section>
      <section anchor="OPP-masking">
        <name>Masking Function</name>
        <t>The masking function of the Masked Even-Mansour (MEM) construction used inside OPP is implemented by a word-oriented LFSR over 256-bit states. Each state consists of four 64-bit words (x0, x1, x2, x3). The LFSR update function φ is defined as follows:</t>
        <artwork><![CDATA[
φ : (x0, x1, x2, x3) ↦ (x1, x2, x3, (x0 &lt;&lt;&lt; 3) XOR (x3 &gt;&gt; 5))
]]></artwork>
        <t>Here, rotation to the left (<tt>&lt;&lt;&lt;</tt>) and logical right shift (<tt>&gt;&gt;</tt>) are taken on 64-bit words. Note: In the reference implementation and test vectors, the right operation is instantiated as a logical right shift (<tt>&gt;&gt; 5</tt>), not a rotation. For Areion256-OPP, <tt>&gt;&gt; 5</tt> MUST be interpreted as logical right shift.</t>
      </section>
      <section anchor="OPP-key-nonce-format">
        <name>Key and Nonce Formatting</name>
        <t>Let N denote the 128-bit nonce and K denote the secret key. The underlying permutation in the MEM construction takes a 256-bit input which is mapped to the two 128-bit words (L, R) for Areion-256.</t>
        <ul spacing="normal">
          <li><t>For a 128-bit key K, the initial 256-bit input to Areion-256 is N || K, where N is the most-significant 128 bits and K is the least-significant 128 bits. This concatenation is mapped to the two 128-bit words (L, R) = (N, K).</t></li>
          <li>
            <t>For a 256-bit key K, this document RECOMMENDS the following initialization:</t>
            <artwork><![CDATA[
L || R = (N || 0^128) XOR K
]]></artwork>
            <t>Here, 0^128 denotes the 128-bit all-zero value. In other words, the 256-bit quantity (N || 0^128) is XORed with the 256-bit key K, and the result is assigned to (L, R).</t>
          </li>
        </ul>
        <t>The following subsections describe the full encryption and decryption algorithms, derived from the Generic OPP specification <xref target="OPP-eprint"/> but instantiated specifically for Areion-256.</t>

        <section anchor="OPP-functions">
          <name>Helper Functions</name>
          <t>We define the following helper functions on the 256-bit state S = (x0, x1, x2, x3):</t>
          <dl>
            <dt>φ(S)</dt>
            <dd>The LFSR update function defined in <xref target="OPP-masking"/>: (x1, x2, x3, (x0 &lt;&lt;&lt; 3) XOR (x3 &gt;&gt; 5)).</dd>
            <dt>β(S)</dt>
            <dd>Defined as φ(S) XOR S.</dd>
            <dt>γ(S)</dt>
            <dd>Defined as φ(φ(S)) XOR φ(S) XOR S.</dd>
            <dt>Pad(X)</dt>
            <dd>Appends a single '1' bit to X, followed by the minimum number of '0' bits to make the length a multiple of 256 bits.</dd>
          </dl>
          <t>Note that β(S) effectively multiplies the state polynomial by (x+1), and γ(S) multiplies by (x^2+x+1).</t>

          <t><strong>MEM Helper:</strong> We define the helper function <tt>MEM(X, Y)</tt>, which corresponds to the MEM (Mask-Encrypt-Mask) operation used in OPP:</t>
          <artwork><![CDATA[
   MEM(X, Y) = Areion-256(X XOR Y) XOR Y
   MEM-Inverse(X, Y) = Areion-256-Inverse(X XOR Y) XOR Y
]]></artwork>
          <t>where <tt>Areion-256-Inverse</tt> denotes the inverse permutation of Areion-256.</t>
        </section>

        <section anchor="OPP-encryption">
          <name>Areion256-OPP Encryption</name>
          <t><strong>Input:</strong> Key K (128/256 bits), Nonce N (128 bits), Associated Data A, Message M.<br/>
          <strong>Output:</strong> Ciphertext C, Tag T (256 bits).</t>

          <t><strong>Byte Ordering:</strong> All conversions between byte strings and 64-bit words use Little Endian byte ordering.</t>

          <t><strong>Padding:</strong> The padding function <tt>Pad(X)</tt> appends a single byte <tt>0x01</tt> to the input <tt>X</tt>, followed by the minimum number of <tt>0x00</tt> bytes to make the length a multiple of 256 bits.</t>

          <t><strong>Initialization:</strong></t>
          <artwork><![CDATA[
   // Map (K, N) to initial mask La
   If |K| == 128: S_init = Nonce || Key
   Else (|K| == 256): S_init = (Nonce || 0^128) XOR Key
   La = Areion-256(S_init)
   Le = γ(La)
   
   // Initialize accumulators
   Sa = 0^256 (Associated Data accumulator)
   Se = 0^256 (Encryption accumulator)
]]></artwork>

          <t><strong>Processing Associated Data A:</strong></t>
          <artwork><![CDATA[
   Break A into 256-bit blocks A_0, ..., A_{h-1}
   For i = 0 to h-2:
       // Absorb full block
       B = A_i XOR La
       P = Areion-256(B)
       Sa = Sa XOR P XOR La
       La = φ(La)
   
   // Process last block A_{h-1} (potentially partial or empty)
   // Note: Reference implementation treats empty A same as partial
   If |A| > 0:
       La = β(La)
       If |A_{h-1}| == 256: 
           // Full last block treated as partial in logic flow if separate function
           // But ref optimization absorbs normally. 
           // Standard OPP Logic for Last Block:
           PadA = Pad(A_{h-1}) // If already full, Pad appends full block? 
                               // No, ref `opp_absorb_lastblock` always pads.
           // Ref logic: if input length is multiple of block size, process as full blocks.
           // If partial remaining, process last block.
           // However, for consistency with ref `opp_absorb_data`:
       
       If A_{h-1} is full block (processed in loop):
           (Already done)
       Else (Partial A_{h-1}):
           PadA = Pad(A_{h-1})
           B = PadA XOR La
           P = Areion-256(B)
           Sa = Sa XOR P XOR La
           La = φ(La)
]]></artwork>
          <t><em>Correction based on Reference:</em> The reference `opp_absorb_data` processes all full blocks. If there is a partial remainder (or if explicitly finalized), it calls `opp_absorb_lastblock`. `opp_absorb_lastblock` applies `β` to the mask, pads the input, and absorbs.</t>
          
          <artwork><![CDATA[
   // Corrected Logic matching ref `opp_absorb_data` and `opp_absorb_lastblock`
   Mask = La
   While |A| >= 256:
       Block = A[0..255]
       Sa = Sa XOR Areion-256(Block XOR Mask) XOR Mask
       Mask = φ(Mask)
       A = A[256..end]
   
   If |A| > 0 (Partial Remainder):
       Mask = β(Mask)
       PadA = Pad(A)
       Sa = Sa XOR Areion-256(PadA XOR Mask) XOR Mask
       Mask = φ(Mask)
]]></artwork>
          <t><strong>Encryption of Message M:</strong></t>
          <artwork><![CDATA[
   // Se accumulates message checksum, Le is mask
   Mask = Le
   While |M| >= 256:
       Block = M[0..255]
       // Encrypt Block using MEM
       // Out = Areion-256(Block XOR Mask) XOR Mask
       Out = MEM(Block, Mask)
       Append Out to Ciphertext
       // Accumulate Plaintext
       Se = Se XOR Block
       Mask = φ(Mask)
       M = M[256..end]
   
   If |M| > 0 (Partial Remainder):
       Mask = β(Mask)
       PadM = Pad(M)
       // Encrypt Partial
       // Keystream generation: Encrypt Zero-Block with Mask
       Keystream = MEM(0^256, Mask)
       Out = Keystream XOR PadM
       Append Out[0..|M|-1] to Ciphertext
       // Accumulate Padded Plaintext
       Se = Se XOR PadM
]]></artwork>

          <t><strong>Finalization (Tag Generation):</strong></t>
          <artwork><![CDATA[
   // Calculate Tag Mask M_tag
   // Ref: `opp_finalise`: m = beta(beta(mask))
   
   M_tag = β(β(Mask))
   
   // Tag = Sa XOR MEM(Se, M_tag)
   // Block = Areion-256(Se XOR M_tag) XOR M_tag
   Block = MEM(Se, M_tag)
   Tag = Sa XOR Block
   
   Return (Ciphertext, Tag)
]]></artwork>
        </section>

        <section anchor="OPP-decryption">
          <name>Areion256-OPP Decryption</name>
          <t>Decryption proceeds similarly to encryption, but uses the <tt>Areion-256-Inverse</tt> permutation for full blocks to recover the message, consistent with the OPP mode specification <xref target="OPP-eprint"/>.</t>
          <t>For full blocks C_j, M_j = MEM-Inverse(C_j, Mask). For the partial last block, the keystream is recovered by re-encrypting the mask (using forward Areion-256), then XORed with C_{m-1} to recover M_{m-1}. Verification fails if the calculated tag does not match the received tag.</t>
        </section>
      </section>
    </section>

    <section anchor="security-considerations">
      <name>Security Considerations</name>
      <t>The security of Areion and its applications is analyzed in detail in <xref target="Areion"/>. This section summarizes the security claims and findings.</t>
      <section anchor="security-claims">
        <name>Security Claims</name>
        <ul spacing="normal">
          <li><t><strong>Permutations (Areion-256, Areion-512):</strong> The permutations are claimed to provide 128-bit security as public permutations.</t></li>
          <li><t><strong>SFIL Hash (Areion256-DM, Areion512-DM):</strong> These hash functions claim 256-bit security against preimage attacks. Collision resistance is not claimed as it is generally not required for their intended applications (e.g., hash-based signatures).</t></li>
          <li><t><strong>VIL Hash (Areion512-MD):</strong> This hash function claims 256-bit security against preimage attacks and 128-bit security against collision attacks, consistent with SHA2-256.</t></li>
        </ul>
      </section>
      <section anchor="aead-security">
        <name>AEAD Security</name>
        <t>Areion256-OPP provides confidentiality and integrity for messages under the assumption that Areion-256 behaves as a pseudorandom permutation. The security properties can be described using the terminology in <xref target="RFC9771"/>.</t>
        <ul spacing="normal">
          <li><strong>Nonce Misuse:</strong> OPP is a nonce-respecting mode. If a nonce is repeated with the same key, confidentiality is lost for the colliding blocks (similar to ECB mode for those blocks), and authenticity is completely compromised. Implementations MUST ensure nonces are unique.</li>
          <li><strong>Related-Key Attacks:</strong> As noted in <xref target="OPP-key-nonce-format"/>, the 256-bit key initialization <tt>(N || 0^128) XOR K</tt> allows trivial related-key attacks. If an attacker can control specific bit-flips in both the Key and the Nonce, they can force the same initial state. Usage of Areion256-OPP in protocols that allow related-key queries is NOT RECOMMENDED.</li>
          <li><strong>Usage Limits:</strong> While OPP enables parallel processing, the birthday bound security (128 bits for 256-bit block) implies that one should not process more than 2^128 blocks with a single key. Given the large block size, this limit is practically unreachable.</li>
        </ul>
      </section>

      <section anchor="security-analysis">
        <name>Cryptanalysis Summary</name>
        <t>The full-round versions of Areion-256 (10 rounds) and Areion-512 (15 rounds) provide sufficient security margins against known attacks.</t>
        <ul spacing="normal">
          <li><t><strong>Differential/Linear Attacks:</strong> The number of active S-boxes in reduced-round variants grows quickly, satisfying the 128-bit security threshold (22 active S-boxes) at 4 rounds for Areion-256 and 6 rounds for Areion-512, providing a large margin for the full-round versions.</t></li>
          <li><t><strong>Impossible Differential Attacks:</strong> The longest impossible differential distinguishers found are for 4 rounds of Areion-256 and 8 rounds of Areion-512, well below the full round counts.</t></li>
          <li><t><strong>Integral and Zero-Sum Attacks:</strong> The most effective integral-style attack is the zero-sum distinguisher. Such distinguishers have been found for 5 rounds of Areion-256 (with 2^32 data/time) and 10 rounds of Areion-512 (with 2^32 data/time). These are the attacks with the deepest penetration, and the full round counts are set to provide a 2x or 1.5x margin over them, respectively.</t></li>
          <li><t><strong>MITM Preimage Attacks:</strong> For the DM hash constructions, meet-in-the-middle preimage attacks were found on 5-round Areion256-DM and 10-round Areion512-DM. This confirms the security margins of the 10-round and 15-round permutations used in the final hash constructions.</t></li>
        </ul>
      </section>
    </section>

    <section anchor="IANA">
      <name>IANA Considerations</name>
      <t>This document has no IANA actions.</t>
    </section>

  </middle>
  <back>
    <references>
      <name>References</name>
      <references anchor="sec-normative-references">
        <name>Normative References</name>
        <reference anchor="RFC2119" target="https://www.rfc-editor.org/info/rfc2119">
          <front>
            <title>Key words for use in RFCs to Indicate Requirement Levels</title>
            <author fullname="S. Bradner" initials="S." surname="Bradner"/>
            <date month="March" year="1997"/>
            <abstract>
              <t>In many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.</t>
            </abstract>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="2119"/>
          <seriesInfo name="DOI" value="10.17487/RFC2119"/>
        </reference>
        <reference anchor="RFC8174" target="https://www.rfc-editor.org/info/rfc8174">
          <front>
            <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
            <author fullname="B. Leiba" initials="B." surname="Leiba"/>
            <date month="May" year="2017"/>
            <abstract>
              <t>RFC 2119 specifies common key words that may be used in protocol specifications. This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings.</t>
            </abstract>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="8174"/>
          <seriesInfo name="DOI" value="10.17487/RFC8174"/>
        </reference>
        <reference anchor="Areion" target="https://eprint.iacr.org/2023/794.pdf">
          <front>
            <title>Areion: Highly-Efficient Permutations and Its Applications (Extended Version)</title>
            <author initials="T." surname="Isobe" fullname="Takanori Isobe">
              <organization>University of Hyogo</organization>
            </author>
            <author initials="R." surname="Ito" fullname="Ryoma Ito">
              <organization>National Institute of Information and Communications Technology</organization>
            </author>
            <author initials="F." surname="Liu" fullname="Fukang Liu">
              <organization>University of Hyogo</organization>
            </author>
            <author initials="K." surname="Minematsu" fullname="Kazuhiko Minematsu">
              <organization>NEC</organization>
            </author>
            <author initials="M." surname="Nakahashi" fullname="Motoki Nakahashi">
              <organization>University of Hyogo</organization>
            </author>
            <author initials="K." surname="Sakamoto" fullname="Kosei Sakamoto">
              <organization>University of Hyogo</organization>
            </author>
            <author initials="R." surname="Shiba" fullname="Rentaro Shiba">
              <organization>Mitsubishi Electric Corporation</organization>
            </author>
            <date year="2023"/>
          </front>
        </reference>
        <reference anchor="OPP-eprint" target="https://infoscience.epfl.ch/server/api/core/bitstreams/9580a315-f12f-482c-b3ef-591411d8c65c/content">
          <front>
            <title>Improved Masking for Tweakable Blockciphers with Applications to Authenticated Encryption</title>
            <author initials="R." surname="Granger" fullname="Robert Granger">
              <organization>École polytechnique fédérale de Lausanne</organization>
            </author>
            <author initials="P." surname="Jovanovic" fullname="Philipp Jovanovic">
              <organization>University of Passau</organization>
            </author>
            <author initials="B." surname="Mennink" fullname="Bart Mennink">
              <organization>KU Leuven</organization>
            </author>
            <author initials="S." surname="Neves" fullname="Samuel Neves">
              <organization>University of Coimbra</organization>
            </author>
            <date year="2016"/>
          </front>
        </reference>
      </references>
      <references anchor="sec-informative-references">
        <name>Informative References</name>
        <reference anchor="SP800-38A" target="https://doi.org/10.6028/NIST.SP.800-38A">
          <front>
            <title>Recommendation for Block Cipher Modes of Operation: Methods and Techniques</title>
            <author initials="M." surname="Dworkin" fullname="Morris Dworkin">
              <organization>National Institute of Standards and Technology</organization>
            </author>
            <date year="2001"/>
          </front>
        </reference>
        <reference anchor="PublicCommentOnSP800-38A" target="https://csrc.nist.gov/CSRC/media/Projects/crypto-publication-review-project/documents/initial-comments/sp800-38a-initial-public-comments-2021.pdf">
          <front>
            <title>PUBLIC COMMENTS ON SP 800-38A, Recommendation for Block Cipher Modes of Operation: Methods and Techniques, Annex C: Ciphertext Stealing Modes and Related Issues Addendum, Three Variants of Ciphertext Stealing for CBC Mode</title>
            <author>
              <organization>National Institute of Standards and Technology</organization>
            </author>
            <date year="2021"/>
          </front>
        </reference>
        <reference anchor="RFC5116" target="https://www.rfc-editor.org/info/rfc5116">
        <front>
          <title>An Interface and Algorithms for Authenticated Encryption</title>
          <author initials="D." surname="McGrew" fullname="D. McGrew"/>
          <date year="2008" month="January"/>
        </front>
        <seriesInfo name="RFC" value="5116"/>
        <seriesInfo name="DOI" value="10.17487/RFC5116"/>
      </reference>
      <reference anchor="RFC9771" target="https://www.rfc-editor.org/info/rfc9771">
          <front>
            <title>Properties of Authenticated Encryption with Associated Data (AEAD) Algorithms</title>
            <author>
              <organization>IRTF</organization>
            </author>
            <date year="2025" month="May"/>
          </front>
          <seriesInfo name="RFC" value="9771"/>
        </reference>
      </references>
    </references>
    
    <section anchor="example-implementation">
      <name>Example Implementation</name>
      <t>The following C code provides a reference implementation for the Areion-256 and Areion-512 permutations using Intel AES-NI intrinsics, as derived from Appendix A.1 of <xref target="Areion"/>. This code corresponds to the optimal (2, 1)-perm and (0, 1, 0, 3, pi_1)-perm choices identified in the paper, which match the F-function definitions in <xref target="functions"/>.</t>
      
      <t>Note: The implementation in <xref target="Areion"/> uses macros that directly map to the chosen (2,1)-perm and (0,1,0,3,pi_1)-perm. The code below is a more direct translation of the algorithmic descriptions in <xref target="areion-256-permutation"/> and <xref target="areion-512-permutation"/> using the function definitions from <xref target="functions"/> for clarity.</t>
      
      <artwork type="C"><![CDATA[
#include <stdint.h>
#include <immintrin.h>

/* Round Constants (from Table 1) */
/* Stored as 15 x 128-bit values */
const uint32_t RC[15][4] = {
    {0x243f6a88, 0x85a308d3, 0x13198a2e, 0x03707344},
    {0xa4093822, 0x299f31d0, 0x082efa98, 0xec4e6c89},
    {0x452821e6, 0x38d01377, 0xbe5466cf, 0x34e90c6c},
    {0xc0ac29b7, 0xc97c50dd, 0x3f84d5b5, 0xb5470917},
    {0x9216d5d9, 0x8979fb1b, 0xd1310ba6, 0x98dfb5ac},
    {0x2ffd72db, 0xd01adfb7, 0xb8e1afed, 0x6a267e96},
    {0xba7c9045, 0xf12c7f99, 0x24a19947, 0xb3916cf7},
    {0x801f2e28, 0x58efc166, 0x36920d87, 0x1574e690},
    {0xa458fea3, 0xf4933d7e, 0x0d95748f, 0x728eb658},
    {0x718bcd58, 0x82154aee, 0x7b54a41d, 0xc25a59b5},
    {0x9c30d539, 0x2af26013, 0xc5d1b023, 0x286085f0},
    {0xca417918, 0xb8db38ef, 0x8e79dcb0, 0x603a180e},
    {0x6c9e0e8b, 0xb01e8a3e, 0xd71577c1, 0xbd314b27},
    {0x78af2fda, 0x55605c60, 0xe65525f3, 0xaa55ab94},
    {0x57489862, 0x63e81440, 0x55ca396a, 0x2aab10b6}
};

/* Load constant RC_r for little-endian byte order */
#define RC_LOAD(i) _mm_setr_epi32(RC[i][3], RC[i][2], RC[i][1], RC[i][0])
/* Zero constant for F1 and F0 */
#define RC_ZERO    _mm_setzero_si128()

/* F_0(x) = MC(SR(SB(x))) */
static inline __m128i F_0(__m128i x) {
    return _mm_aesenc_si128(x, RC_ZERO);
}

/* F_1(x) = SR(SB(x)) */
static inline __m128i F_1(__m128i x) {
    return _mm_aesenclast_si128(x, RC_ZERO);
}

/* F_2^{(r)}(x) =
     MC(SR(SB( AC(MC(SR(SB(x))), RC_r) ))) */
static inline __m128i F_2(__m128i x, int r) {
    __m128i tmp = _mm_aesenc_si128(x, RC_LOAD(r));
    return _mm_aesenc_si128(tmp, RC_ZERO);
}

/* F_3^{(r)}(x) =
     MC(SR(SB( AC(SR(SB(x)), RC_r) ))) */
static inline __m128i F_3(__m128i x, int r) {
    __m128i tmp = _mm_aesenclast_si128(x, RC_LOAD(r));
    return _mm_aesenc_si128(tmp, RC_ZERO);
}


/* Areion-256 Permutation */
void permute_areion_256(__m128i state[2])
{
    __m128i L = state[0];
    __m128i R = state[1];
    __m128i T;

    for (int i = 0; i < 10; ++i) {
        /*
           Round function logic matching Figure 1(a):
           x0 (L) feeds F2, output XORed to x1 (R).
           x0 (L) feeds F1, output becomes new x0.
           Then swap.
        */

        /* 1. T = F_2^{(r)}(L) */
        T = F_2(L, i);
        /* 2. R = R ^ T */
        R = _mm_xor_si128(R, T);
        /* 3. L = F_1(L) */
        L = F_1(L);

        /* Swap L and R for next round, except last */
        if (i < 9) {
            T = L; L = R; R = T;
        }
    }
    
    state[0] = L;
    state[1] = R;
}

/* Areion-512 Permutation */
void permute_areion_512(__m128i state[4])
{
    __m128i A = state[0], B = state[1], C = state[2], D = state[3];
    __m128i tmp;

    for (int i = 0; i < 15; ++i) {
        /*
          Ref Logic: Simultaneous Update
          B = F_0(A) ^ B  => aesenc(A, B)
          D = F_0(C) ^ D  => aesenc(C, D)
          A = F_1(A)      => aesenclast(A, 0)
          C = F_3(C, RC)  => aesenc(aesenclast(C, RC), 0)
        */

        __m128i next_B = _mm_aesenc_si128(A, B);
        __m128i next_D = _mm_aesenc_si128(C, D);
        __m128i next_A = _mm_aesenclast_si128(A, RC_ZERO);
        
        __m128i tmp_C  = _mm_aesenclast_si128(C, RC_LOAD(i));
        __m128i next_C = _mm_aesenc_si128(tmp_C, RC_ZERO);
        
        A = next_A;
        B = next_B;
        C = next_C;
        D = next_D;

        /* Shuffle (A,B,C,D) -> (B,C,D,A) (Left Rotate) */
        /* Note: Matches test vectors when applied in all rounds */
        tmp = A; A = B; B = C; C = D; D = tmp;
    }
    
    state[0] = A;
    state[1] = B;
    state[2] = C;
    state[3] = D;
}
]]></artwork>
    </section>
    
    <section anchor="test-cases-test-vectors">
      <name>Test Cases &amp; Test Vectors</name>
      <t>This section provides test vectors for the Areion permutations, hash functions, and the Areion256-OPP AEAD scheme. These vectors are derived from Appendix B of the Areion design paper <xref target="Areion"/> and are provided here so that implementations can be validated using this document alone.</t>

      <section>
        <name>Areion-256 Permutation</name>
        <t>The following test vectors apply the Areion-256 permutation to a 256-bit input block.</t>
        <artwork><![CDATA[
/* test vector #1 */
Input:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Output:
28 12 a7 24 65 b2 6e 9f ca 75 83 f6 e4 12 3a a1
49 0e 35 e7 d5 20 3e 4b a2 e9 27 b0 48 2f 4d b8

/* test vector #2 */
Input:
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
Output:
68 84 5f 13 2e e4 61 60 66 c7 02 d9 42 a3 b2 c3
a3 77 f6 5b 13 bb 05 c7 cd 1f b2 9c 89 af a1 85
]]></artwork>
      </section>

      <section>
        <name>Areion-512 Permutation</name>
        <t>The following test vectors apply the Areion-512 permutation to a 512-bit input block.</t>
        <artwork><![CDATA[
/* test vector #1 */
Input:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Output:
b2 ad b0 4f a9 1f 90 15 59 36 71 22 cb 3c 96 a9
78 cf 3e e4 b7 3c 6a 54 3f e6 dc 85 77 91 02 e7
e3 f5 50 10 16 ce ed 1d d2 c4 8d 0b c2 12 fb 07
ad 16 87 94 bd 96 cf f3 59 09 cd d8 e2 27 49 28

/* test vector #2 */
Input:
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f
30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f
Output:
b6 90 b8 82 97 ec 47 0b 07 dd a9 2b 91 95 9c ff
13 5e 9a c5 fc 3d c9 b6 47 a4 3f 4d aa 8d a7 a4
e0 af bd d8 e6 e2 55 c2 45 27 73 6b 29 8b d6 1d
e4 60 ba b9 ea 79 15 c6 d6 dd be 05 fe 8d de 40
]]></artwork>
      </section>

      <section>
        <name>Areion256-DM</name>
        <t>The Areion256-DM hash function is the Davies-Meyer construction using the Areion-256 permutation. The following test vectors give the 256-bit hash value H = Areion256(X) XOR X for a single 256-bit input block X.</t>
        <artwork><![CDATA[
/* test vector #1 */
Input:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Output:
28 12 a7 24 65 b2 6e 9f ca 75 83 f6 e4 12 3a a1
49 0e 35 e7 d5 20 3e 4b a2 e9 27 b0 48 2f 4d b8

/* test vector #2 */
Input:
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
Output:
68 85 5d 10 2a e1 67 67 6e ce 08 d2 4e ae bc cc
b3 66 e4 48 07 ae 13 d0 d5 06 a8 87 95 b2 bf 9a
]]></artwork>
      </section>

      <section>
        <name>Areion512-DM</name>
        <t>The Areion512-DM hash function uses the Areion-512 permutation in Davies-Meyer mode and truncates the 512-bit output as specified in this document. The following test vectors provide the 256-bit hash value for a single 512-bit input block.</t>
        <artwork><![CDATA[
/* test vector #1 */
Input:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Output:
59 36 71 22 cb 3c 96 a9 3f e6 dc 85 77 91 02 e7
e3 f5 50 10 16 ce ed 1d ad 16 87 94 bd 96 cf f3

/* test vector #2 */
Input:
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f
30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f
Output:
0f d4 a3 20 9d 98 92 f0 5f bd 25 56 b6 90 b9 bb
c0 8e 9f fb c2 c7 73 e5 d4 51 88 8a de 4c 23 f1
]]></artwork>
      </section>

      <section>
        <name>Areion512-MD</name>
        <t>The Areion512-MD hash function is the Merkle-Damg\u00e5rd construction based on the Areion-512 permutation and a Davies-Meyer compression function, as specified in this document. The following test vectors give the 256-bit digest for two messages; the padding and processing are fully defined in this document.</t>
        <artwork><![CDATA[
/* test vector #1 */
Input:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Output:
7f 22 34 44 5f 3a 72 00 65 93 79 42 01 53 6c 94
09 5d ab d3 fd b5 84 67 48 d3 59 55 5c 52 e6 51

/* test vector #2 */
Input:
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f
30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f
40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f
50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f
60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f
70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f
Output:
3e 4d 31 0f be 21 d0 7b b9 00 46 88 a1 50 36 b7
ab d9 ae 2f e9 e6 0c 9a ca 2a cc 36 98 5e 60 0b
]]></artwork>
      </section>

      <section>
        <name>Areion256-OPP AEAD</name>
        <t>The following test vectors correspond to the Areion256-OPP authenticated encryption scheme specified in this document. Each vector gives the key, nonce, associated data, plaintext, ciphertext, and authentication tag.</t>
        <artwork><![CDATA[
/* test vector #1 */
key:
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
nonce:
0f 0e 0d 0c 0b 0a 09 08 07 06 05 04 03 02 01 00
associated data:
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
plaintext:
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f
30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f
ciphertext:
a4 69 c0 ab 00 bf b6 8e 1f f3 74 54 b8 3d da 59
ef 61 1b 32 30 c0 a7 f0 a7 36 7c ab 36 c8 8a 59
d4 dc e1 ec 7e cb 9b ad b4 77 16 93 24 b9 22 b4
ef 04 17 8a 46 58 85 10 c2 44 ae 7b 7c bc 05 a0
tag:
76 12 8b 16 b6 cd 68 21 e3 7b df 58 69 27 61 a5
05 dd 89 f4 cc 81 b7 c9 28 96 53 d6 83 a7 a8 a7

/* test vector #2 */
key:
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
nonce:
0f 0e 0d 0c 0b 0a 09 08 07 06 05 04 03 02 01 00
associated data:
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
plaintext:
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f
30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f
40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f
50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f
60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f
70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f
ciphertext:
16 d7 b2 7a 50 0a a0 3e a1 d1 79 f3 26 63 b3 b9
e3 f0 41 b9 ba dd 0e 4d 59 f1 bf 87 82 5b 2a 30
f9 00 11 96 fd 45 30 6d 59 86 d7 a2 57 0c 6c 8a
df 68 8e 7e a2 0a 27 1b 61 e0 67 39 4f a2 85 5d
e8 71 76 5c ce 79 5b 4d 81 6c 7e b3 74 b1 66 6f
dc a1 de c1 af 22 8b bb eb 76 74 86 b8 52 08 c1
26 f2 b2 79 87 94 0b 03 00 f6 23 27 86 55 ba 5d
c9 db 3e bc 56 55 69 a0 f2 16 22 9d a4 a6 63 d8
tag:
25 d9 b9 09 41 45 e6 1f f0 f5 49 be 6d fe 81 a2
ec 7c e7 8c 8f c0 ba b0 d7 72 1b 9d 80 d4 76 f7
]]></artwork>
      </section>
    </section>
  </back>
</rfc>