Rule language reference

Introduction

The rule language uses a style similar to C. It works differently though. The language has its own data types and statement constructions that are specifically tailored to dealing with requests and responses in a proxy context.

Some of the most obvious peculiarities when compared to a 'normal' programming language are:

Statements

The language has the usual constructions for expressions and if-then-else statements. There is only one statement that causes a loop: the 'foreach' statements, that takes the value of an expression to control the loop. Specific to the task of processing requests and responses are the whitelist and blacklist statements.

Variables

A variable consists of the following items:



All of these variables and variable attributes may be referenced or modified in a rule program; most attributes can be accessed through function calls.

Expressions and operator precedence

Because the rule language is typeless, expressions must be written carefully. Since the only data type is a string, normal (numerical) comparison operators only work as intended if the data they operate on is exactly as expected. For instance,

i=”one”;
if (i==1) {
   ...
}

will produce unexpected results, because the variable i can not be converted to a number. The conversion will produce a default value (in the current versions most likely 0, but this may change in some future version). The numerical == operator will use the converted value, and return false.

Operators are grouped in precedence order in the usual way. Parentheses can be used to make the ordering explicit, or to change the order. The following code snippets give some examples of how this works:

i=2*3+1;           // i is set to 7
i=2*(3+1);         // i is set to 8
if (errorcode==302 && uri -/index\.html/) // evaluate errocode==302,
                                          // then evaluate uri -/index\.html/
                                          // then do logical and   

The operator precedence is according to the following table:

Operator

Description

Associativity

!

logical not

right

-/.../

caseless regular expression

left

~/.../

casefull regular expression

left

*, /

numerical multiplication, division

left

+, -

numerical addition, subtraction

left

<, >, <=, >=

numerical comparison

left

!=, ==

numerical comparison

left

&&

logical and

left

||

logical or

left

=

assignment

left


Special variables

Some 'special' variables are used by Yxorp. These are:

Name

Use

uri

generating requests. The uri actually sent to the server will be taken from this variable. In the response stage, changing will have no effect but the changed value will be logged.

method

set to the method name. Use for reference only; changing this variable will have no effect.

rejectreason

holds the text message explaining the reason that the Yxorp base code rejected the request.

statuscode

holds the status code reported by the server that processed the request

errorcode

if set, the error code from this variable will be reported to the client instead of the default (400 – Bad Request) in case of a rejected request

errormessage

if set, the error reason string from this variable will be reported to the client instead of the default (400 – Bad Request) in case of a rejected request.

errorhtml

if set, html code contained in this variable will be inserted in the reject message.

errortitle

if set and errorhtml is not set, this variable defines the contents of the <title> tag in a reject page.

errorrejectreason

if set and errorhtml is not set, this variable allows an override of the default reason string in a reject page.

rejectedheaders

if Yxorp rejects a header (because it does not know this header, it is overlength, contains illegal characters) the header name is added to this variable.

_pattern

Set to the last executed regular expression pattern within the context of whitelist and blacklist statements. Not set for normal regular expressions.

_[0-9]

Set after the execution of a regular expression, depending on the pattern. Only the variables needed to capture the data extraction from the pattern are updated; other variables are untouched. Note that this functionality is only available if you have PCRE included in the build.



Rule syntax

This section is a reference of the rule program syntax.

Program

program: statements

A program is the entirety of the source code in a single rule. A program (i.e. rule) is comprised of one or more statements.

Note that a rule can not be empty; at least one statement must be in the rule, or the rule will fail to compile.

Block

block: { statements }

A block is used to group one or more statements. As in many other languages, this is most often used together with other statements, like the if statement.

Statements

statement: if (expression) statement 
or         if (expression) statement else statement 
or         foreach identifier (expression) statement 
or         whitelist identifier { list-elements } if-failed statement 
or         blacklist identifier { list-elements } if-failed statement 
or         expression 
or         block
or         return

There are several statements, as listed here. Note that a block can contain statements; by extension, a statement that contains another statement can also contain a block, and thus more statements.

If

if:        if (expression) statement
or         if (expression) statement else statement

The if statement works exactly as you would expect. Note that the expression is expected to result in a truth value. In rules, true is anything that is not an empty string; false is an empty string.

The simple if statement (without else) can also be written as follows:

if-shorthand: '?' expression statement

Foreach

foreach: foreach identifier (expression) statement

The foreach statement implements loops. The expression in the statement is expected to result in a space separated list. The foreach statement loops once for each element in the list, setting the variable named in the identifier to the list element.

The foreach statement is supported by several function calls, that deliver lists of variable names.

Whitelist

whitelist:        whitelist identifier { list-elements } if-failed statement

list-elements:    -/regexp/                     // simple, case insensitive
or                ~/regexp/                     // simple, case sensitive
or                -/regexp/ : list-statement    // complex, case insensitive
or                ~/regexp/ : list-statement    // complex, case sensitive

list-statement:   statement + continue-list

The whitelist statement is used to easily check the value of a variable against a set of regular expressions. If a match is found, the rest of the whitelist statement is skipped. If no match is found, the statement following the if-failed keyword is executed.

All forms of list-elements may be used in the same whitelist. There are no syntax restrictions to code formatting, but normal practice is to write each regular expression on its own line.

If a regular expression is in the complex form, the statement following the regular expression is executed before the normal whitelist action. In this context, a special statement is available with the continue-list keyword. If the continue-list statement is executed, the whitelist is resumed as if no match had occurred; this can be used to handle exception cases.

Normally, the if-failed clause is used to stop execution, reject a request, etc; however, note that this must be done explicitly, since the rule program language does not execute a default action.

After the whitelist completes, the special variable _pattern is set to the last executed pattern in a whitelist. This can be used to determine which pattern in the list matched. If whitelists are nested, the fact that _pattern contains the last executed pattern, not the successful match, may produce unexpected results.

Note that the scope in which _pattern is valid is different in a whitelist and a blacklist.

Blacklist

blacklist:        blacklist identifier { list-elements } if-failed statement

list-elements:    -/regexp/                     // simple, case insensitive
or                ~/regexp/                     // simple, case sensitive
or                -/regexp/ : list-statement    // complex, case insensitive
or                ~/regexp/ : list-statement    // complex, case sensitive

list-statement:   statement + continue-list

As with the whitelist statement, the blacklist statement is used to easily check the value of a variable against a set of regular expressions. If a match is found, the statement following the if-failed keyword is executed. If no match occurs, the if-failed clause is not executed.

All forms of list-elements may be used in the same blacklist. There are no syntax restrictions to code formatting, but normal practice is to write each regular expression on its own line.

If a regular expression is in the complex form, the statement following the regular expression is executed before the normal blacklist action. In this context, a special statement is available with the continue-list keyword. If the continue-list statement is executed, the blacklist resumed as if no match had occurred; this can be used to handle exception cases.

Normally, the if-failed clause is used to stop execution, reject a request, etc; however, note that this must be done explicitly, since the rule program language does not execute a default action.

After the blacklist executes, and within the context of the if-failed clause, the _pattern special variable contains the pattern that last executed. If blacklists are nested, the fact that _pattern contains the last executed pattern, not the successful match, may produced unexpected results.

Note that the scope in which _pattern is valid is different in a whitelist and a blacklist.

Return


return:    return
or         return (expression)

The return statement ends the execution of the current rule program. Either version can be used, but the value of the expression in a return statement is currently not used. This may change in future versions.

Expression

The forms that an expression may have are listed below:

expression:   call(parameters)           // function call
or            literal                    // value
or            number                     // integral number value
or            identifier = expression    // assignment
or            identifier                 // variable
or            $identifier                // indirect variable
or            ! expression               // logical not
or            expression && expression   // logical and
or            expression || expression   // logical or
or            expression + expression    // numerical addition
or            expression – expression    // numerical subtraction
or            expression * expression    // numerical multiplication
or            expression / expression    // numerical division
or            expression == expression   // numerical compare equal
or            expression != expression   // numerical compare not equal
or            expression < expression    // numerical compare less
or            expression > expression    // numerical compare greater
or            expression <= expression   // numerical compare less or equal
or            expression >= expression   // numerical compare greater or equal
or            (expression)               // order expression priority
or            identifier -/regexp/       // case insensitive regexp
or            identifier ~/regexp/       // case sensitive regexp

Most expression forms are very common, and will not be detailed further.

Note that the numerical operations are only valid if the operands are actually numbers.

Identifier

identifier: letter { letter | digit | “_” | “:” }

Identifiers are used for variable names. They must start with a letter, and may contain digits, underscores, and colons. Colons should only be used at the end, and signify that the identifier points to a variable that is used to store the contents of a header.

Internally, identifiers are also used for function names, and will be used in future versions to refer to other rules.

Numbers

number: digit { digit }

Numbers are formed as one or more digits. Decimal points (i.e. floating point numbers) are not supported. Negative numbers are also not supported.

In the internal representation, all numbers are stored as strings. There is no difference between 0 and “0” in the rule syntax.

Literals

literal: '”' { character } '”'

Literals are values that are expressed directly in the source, as in “index.html”. Think of them as strings.

Indirect variable

indirect variable : $identifier

An indirect variable contains the name of another variable; this variable's value is the value of the expression form listed here. If the indirect variable is used without the indirection operator '$', the variable itself is addressed; if the indirection operator is used, the value of the variable points to another variable.

Regular expressions

regular expression: identifier -/regexp/       // case insensitive regexp
or                  identifier ~/regexp/       // case sensitive regexp

Regular expressions are dependent on the regular expression library you included (libc or PCRE). With the posix-libc variant, only basic regexps are available; see your man page for regcomp, regexec, or regex for a description of what you can do if you have this library. Yxorp uses this variant only if PCRE could not be found on your system during configuration. PCRE is highly recommended, as it has much more functionality, and more consistent over different types of system. If you are not sure which library you have, check with yxorp -V how your build is configured.

If you included PCRE, you can use most constructs that are possible in Perl. Extraction of matched data from the original string is supported using the _0, _1, up to _9 special variables. Note that as in Perl, only the variables that were actually necessary for storing matched data are updated.

Both case-sensitive and case-insensitive variants of the regexp calls are available. Respectively use a tilde '~' or a minus '-' to specify which you want to use. The regexp itself must be enclosed in slashes. There is no implicit string begin or end added to the regexp; if you want to match against the start or end of a string, use $ and ^, respectively.

Function reference

basic_auth_check(realm, “local”) – enforce basic authentication

check if valid basic authentication credentials are present in the request (i.e. the Authorization: Basic header). If this is not found, the request is rejected with a 401 status code (normally causing a browser to show the userid/password dialog). The realm (text) is shown in this dialog window, and is also checked against the basic authentication credentials table.

If the authentication was successful, true is returned; false otherwise (and the request rejected). Note that execution of the rule program does not stop if the basic_auth_check fails; this must be done explicitly; normally, a return statement should be used to prevent execution of statements following the call to basic_auth_check. Typical use is as:

// check if basic authentication credentials are set
if (!basic_auth_check(“my-realm”, “local)) {
   return;          // end the rule, so that the implicit reject 
                    // from basic_auth_check is processed
}
// reach here if basic_auth_check was successful

The second parameter to the basic_auth_check function must be exactly “local”. This refers to the use of an internal table for the authentication credentials. Future versions may support other tables; then, this parameter will be used to indicate which table should be used.

clientinrange(range) – check if client IP is in a range

clientinrange checks if the ip address that the client uses on this connection is part of the IPv4 range that is passed. If the client address or the range can not be parsed, false will be returned. Note that the client address can not be parsed if it is in IPv6 format, as would happen if the request comes in from an IPv6 listener. The range must be in the format a.b.c.d/x, where 1<=x<=32.

Typical use is as follows:

// check where the request comes from
if (clientinrange(127.0.0.1/32) {
   // allow some things 
} else if (clientinrange(192.0.2.0/24) {
   // allow some other things
} else {
   // don't allow things
   reject(“sorry...”);
}

clientinip6range – check if client IP is in an IPv6 range

Similar to clientinrange, but for IPv6 addresses. The same limitations apply; IPv4 clients coming in through an IPv4 mode listener can not be correctly processed by this function.

The range can be specified in the following formats:

x:x:x:x:x:x:x:x/y            // default
x::x/y                       // missing parts are set to zeros
::a.b.c.d/y                  // deprecated transitional form
::ffff:a.b.c.d/y             // ipv4 mapped ipv6 address
::1/y                        // localhost

x = 4-digit hexadecimal
y = decimal, 1<=y<=128
a.b.c.d = ipv4 address range

clientstate(type) – retrieve value from client state

clientstate retrieves values from the client state, depending on the string passed to it:

“toclient”: the number of bytes sent to the client on this clientstate (cumulative)

“toserver”: the number of bytes sent to the server on this clientstate (cumulative)

“fromclient”: the number of bytes received from the client on this clientstate (cumulative)

“fromserver”: the number of bytes received from the server on this clientstate (cumulative)

“hitcount”: the number of requests processed on this clientstate

“id”: the cookie value for this clientstate

concat(...) – concatenate

concat takes a variable number of arguments and concatenates them. The concatenated string is returned.

contains(haystack, needle) – test if needle contains haystack

contains is equal to the C function strstr. It takes two arguments; the first is the haystack; the second is the needle. It returns a true if the needle is found in the haystack; false otherwise. The comparison is case sensitive.

Note you can also use a regular expression; this is a lot more flexible.

contains_characters(string, list) – test if string contains characters in list

contains_characters takes two arguments; the first is a string, which is tested against a list (which is also a string, by the way). If any of the characters in the list occur in the string, true is returned; false otherwise. The comparison between characters is case sensitive.

containscase(haystack, needle) – test if needle contains haystack

containscase is equal to the C function strcasestr (if that exists on your platform). It takes two arguments; the first is the haystack; the second is the needle. It returns true if the needle is found in the haystack; false otherwise. The comparison is case insensitive.

Note you can also use a regular expression; this is a lot more flexible.

enumerate_dupvar(variablename) – enumerate duplicate variables

enumerate_dupvar takes, as its only argument, a string containing the name of a variable. It returns a space separated string containing the real variable names of all variables in a duplicate variable set. If the variable is singular (i.e. there are no other members in the duplicate variable set) only the variable's name is returned.

enumerate_reqhdr() – enumerate request header variables

enumerate_reqhdr takes no arguments. It returns a space separated string containing the variable names of all variables that have the REQHDR attribute set. Yxorp sets this attribute for all variables it creates to represent request headers.

enumerate_rsphdr() – enumerate response header variables

enumerate_rsphdr takes no arguments. It returns a space separated string containing the variable names of all variables that have the RSPHDR attribute set. Yxorp sets this attribute for all variables it creates to represent response headers.

equal(s, r) – compare two variables

equal performs a case sensitive comparison of two variables.

equalcase(s, r) – case insensitive compare of two variables

equal performs a case insensitive comparison of two variables.

findheader(header) – retrieve contents of non-standard headers

findheader takes one parameter: the exact, case-sensitive name of a header. It returns the value of this header in the current request or response, or false if it is not found. findheader may only be used in request or response rules; the results are undefined for other rule types. The header is not processed in any way, and the settings in the <header> tags in the <globalconfiguration> section do not apply.

getattr(variablename, attr) – test attributes of a variable

getattr tests a variable for a specific attribute. The variable name must be set as the first parameter; the exact attribute as the second. If the attribute is present, true will be returned; if not, false will be returned.

getclientip() – get the client ip address

getclientip returns the ip address (IPv4 or IPv6) of the client associated with the current request.

getclientdomainname() – get the client domain name

getclientip returns the domain name of the client associated with the current request. Note that using this function causes a domain name lookup to be done; this may impact performance, especially for large volumes.



getclientstatedvar(variablename) – read a variable from the client state

getclientstatedvar reads a variable from the client state, if one exists for this request. The variable name must be set as the first parameter. If the variable and the client state exist, the data will be returned; otherwise, an empty string (i.e. false) will be returned.

getlength(variablename) – returns the length of a variable

getlength returns the length of a variable. The name of the variable is passed to getlength.

getlistenerid() – returns the listener id

getlistenerid returns the id (i.e. name) of the listener that has received the request that is currently being processed.

getmaxcount(variablename) – return the number of variables in a dupvar group

getmaxcount returns the number of variables in a duplicate variable group

getmaxlength(variablename) – return the maxlength attribute of a variable

getmaxlength returns the maxlength attribute of a variable.

getorder(variablename) – return the order attribute of a variable

getorder returns the order attribute of a variable.

getoriginalname(variablename) – return the originalname attribute of a variable

getorder returns the originalname attribute of a variable.

getserverturnaround() – return the turnaround time for the server request

getserverturnaround returns the time, in milliseconds, that it took from sending the request to the server, to the response header being completely received. This gives an indication of server response; note that this time is however in some cases quite different from the response time an end user may experience.

getsslbackend() – return the sslbackend flag

getsslbackend returns the value of the sslbackend flag. If set, this means Yxorp will use SSL to the server; if not set, Yxorp will use plain HTTP.

getsticky() – return the sticky flag

getsticky returns the value of the sticky flag.

issslsession() – return the SSL state of the client session

issslsession returns true if the client session uses SSL; false otherwise.

redirect(url) – redirect request

redirects the request, by rejecting it with a statuscode 307 (Temporary Redirect), passing the URL in a Location: header. This causes a browser to redirect to this URL.

reject(reason) – reject request

the request is rejected, with the specified reason. The reason string is also logged to the error log.

setattr(variablename, type) – set variable attribute

Sets the attribute on the specified variable. The following attribute values can be set:

RSPHDR: this variable is part of the response header group.

REQHDR: this variable is part of the request header group.

REJHDR: this variable is part of the reject header group.

setclientstatedvar(variablename, value) – set a variable in the client state

setclientstatedvar sets a variable from the client state, if one exists for this request. The variable name must be set as the first parameter; its value as the second. Note that client states are created after a request type rule runs, so on the first request from a client, the client state is not yet available. Also note that the number of slots in the client state dvar table is normally very limited (but it can be increased in the global configuration).

setcookiedomain(string) – set cookie domain

This function sets the domain part of the state cookie that Yxorp uses for tracking or sticky load balancing. If no domain is set, the generally accepted behavior of clients is to only send back the cookie values to the same domain that set the cookie. If you use the domain part, you can cause the cookie values to be sent to several hostnames in a domain group (i.e. set the domain to .example.org to have the cookie sent to www.example.org, www2.example.org, etc).

setdup(variablename, value) – set duplicate variable

Sets the next instance of the duplicate variable to the specified value.

If, for example, a dupvar exists:

ex=an
ex1=example

the call setdup(“ex”, “test”) would have the following result:


ex=an
ex1=example
ex2=test

setmaxlength(variablename, length) – set variable max length attribute

Sets the maximum length attribute on the specified variable (after this, setting a longer value in a variable results in the value being truncated).

setorder(variablename, ordinal) – set variable order attribute

Sets the order attribute on the specified variable.

setoriginalname(variablename, originalname) – set variable originalname attribute

Sets the originalname attribute on the specified variable.

setsslbackend() – set ssl backend

Forces the server session for this request to use SSL.

setsticky() – set sticky flag

Sets the sticky flag for this request, causing sticky scheduling and client tracking for this request.

statuscodemsg(code) – returns status code textual message

The text message for the HTTP response code is returned.

trace(message) – send a message to the trace log

The message is written to the trace log. If no trace log is active, the message is discarded.

unsetsslbackend() – clear ssl backend

Stops forcing the server session for this request to use SSL.

unsetsticky() – clear sticky flag

Clears the sticky flag for this request, disabling sticky scheduling and client tracking for this request.

yesno(x) – return truth value as yes or no

yesno takes the truth value from the argument and returns yes (=true) or no(=false).

The virtual machine

The code generated from the rule program sources is compiled, and executed by a virtual machine. The virtual machine implements a stack machine (like a RPN desk calculator) working on strings.

In general, the virtual machine is surprisingly fast. The instructions that the virtual machine processes are mostly very simple and straightforward. Even though the rule compiler is simple, the execution speed of the rule programs is not really impacted by the lack of optimization of the compiler. Most data manipulations are accomplished just by moving pointers around. Temporary variables are allocated in scratch memory pools; this speeds up execution because the memory allocation is much faster in this way, and also this ensures that all the used pointers will be valid for the duration of the VM run, even if memory shortages occur.

There is, of course, also a weak point: the virtual machine is not especially good at arithmetic. Since all data is represented as strings, calculations require several conversions. Number handling is also quite simplistic; if a string value is used in a numeric operation, the value will default to 0 instead of causing an error (this behavior is likely to change in a future version).

The use of memory pools causes that there is a limit to the maximum size of an object. Currently, this is in the order of 4K bytes, which should not be a limitation for typical use. If your application needs larger objects, source-level tuning is necessary.

As all other Yxorp modules, the virtual machine contains a large amount of debugging hooks. If you build Yxorp with debugging enabled, performance will be significantly lowered.