Rule language reference

Introduction

The rule language uses a style similar to C. It works differently though. The language has its own data types and statement constructions that are specifically tailored to dealing with requests and responses in a proxy context.

Some of the most obvious peculiarities when compared to a 'normal' programming language are:

There is only one data type: a sequence of characters terminated by a binary zero (a 'string').
Literals must be enclosed in double quotes, unless each character is a number. There is, however, no difference in how the literal is stored: Numbers are represented in exactly the same way as strings. The handling of numbers is not especially strong; Yxorp was not designed to do number crunching.
Truth values are represented by an empty string for false, and a non-empty string for true. The virtual machine and the built in functions tend to return true as a string containing “T”.
Variables are not declared; the parser will automatically declare variables 'on the fly'. In general, if a variable is referenced in any way, it is created. Reading a non-existent variable will result in an empty string.
Variable names are case insensitive.
Variable names may include a colon (':'). A variable name ending in a colon is used for variables containing header values; creating (i.e. referencing) a variable name ending in a colon causes Yxorp to create a HTTP header with the tag set to the variable name, and the value set to the variable's value.
Variable names may not contain the minus ('-'). This is very awkward – especially because several header names contain a minus, and thus need to be translated. Normally, the default translation for a minus is an underscore ('_') and vice versa. This is default translation is done automatically for all header-type variables (those ending in a colon) that have no predefined translation (as in: defined in the table of headers in the globalconfiguration).
Variables at runtime comprise the actual value and a number of attributes; these determine the actual header name (which can be different from the variable name in the rule), the order of the header in the list of headers to be sent, the maximum acceptable length, etc.
It is not possible to define functions. Function calls are available, but all called functions must be predefined, and their runtime must be included in the virtual machine.
Since version 0.21 it is possible to do a function-call to another rule, but this functionality is very limited. It is not (yet) possible to pass parameters, or retrieve a return value.

Statements

The language has the usual constructions for expressions and if-then-else statements. There is only one statement that causes a loop: the 'foreach' statement, that takes the value of an expression to control the loop. Specific to the task of processing requests and responses are the whitelist and blacklist statements.

Variables

A variable consists of the following items:

The variable name, as used in the program source
The current value
The 'original name'; this is the case-sensitive token used to generate the actual header name.

The original name is not in any way dependent on the variable name; however, the variables created by Yxorp for RFC-defined headers are almost the same as the original names associated to the variables (an exception is the '-' character, which is replaced by '_').
The order number; this is used to specify the header order in a request or response. As some HTTP clients and servers are sensitive to the order in which headers are included in requests or responses, Yxorp keeps track of the original order of the headers. During processing of a request or response, Yxorp sets the order number of the first header in a request or response to 10, incrementing by 10 for each next header.
The variable's actual length.
The variable's maximum length.
A flag indicating if this variable has been truncated by the maximum length setting.
A flag marking this variable as a request header, response header, or a header set for rejecting requests.
A count field; in case multiple instances ('duplicate dvars' or 'dupvar') of a variable exist, the base variable will have it's count field set to the highest sequence number of the set of multiple instances.

The first variable will have it's name set to the base name; subsequent variables will have a sequence number suffixed to the base name, where the sequence number starts at 1 for the first duplicate.

Normally, these variables are only referenced through a foreach statement.

Yxorp uses duplicate variables to accommodate the fact that many HTTP clients and servers do not support header folding, as described in rfc2616. Headers are converted into variables exactly as they are sent out by clients and servers; if a client sends two header lines for the same header line, that is exactly how these will be represented as header variables.

All variables and variable attributes may be referenced or modified in a rule program. Variables can be directly accessed in the rule language; duplicate variables and attributes can be accessed through function calls and rule language constructs.

Expressions and operator precedence

Because the rule language is typeless, expressions must be written carefully. Since the only data type is a string, normal (numerical) comparison operators only work as intended if the data they operate on is exactly as expected. For instance,

i=”one”;
if (i==1) {
   ...
}

will produce counterintuitive results, because the variable i can not be converted to a number. The conversion will produce a default value (in the current versions most likely 0, but this may change in some future version). The numerical == operator will use the converted value, and return false.

Operators are grouped in precedence order in the usual way. Parentheses can be used to make the ordering explicit, or to change the order. The following code snippets give some examples of how this works:

i=2*3+1;           // i is set to 7
i=2*(3+1);         // i is set to 8
if (errorcode==302 && uri -/index\.html/) // evaluate errocode==302,
                                          // then evaluate uri -/index\.html/
                                          // then do logical and

The operator precedence is according to the following table:

Operator	Description	Associativity
!	logical not	right
-/.../	caseless regular expression	left
~/.../	casefull regular expression	left
*, /	numerical multiplication, division	left
+, -	numerical addition, subtraction	left
<, >, <=, >=	numerical comparison	left
!=, ==	numerical comparison	left
&&	logical and	left
\|\|	logical or	left
=	assignment	left

Special variables

Some 'special' variables are used by Yxorp. These are:

Name	Use
uri	generating requests. The uri actually sent to the server will be taken from this variable. In the response stage, changing will have no effect but the changed value will be logged.
method	set to the method name. Use for reference only; changing this variable will have no effect.
rejectreason	holds the text message explaining the reason that the Yxorp base code rejected the request.
statuscode	holds the status code reported by the server that processed the request
errorcode	if set, the error code from this variable will be reported to the client instead of the default (400 – Bad Request) in case of a rejected request
errormessage	if set, the error reason string from this variable will be reported to the client instead of the default (400 – Bad Request) in case of a rejected request.
errorhtml	if set, html code contained in this variable will be inserted in the reject message.
errortitle	if set and errorhtml is not set, this variable defines the contents of the <title> tag in a reject page.
errorrejectreason	if set and errorhtml is not set, this variable allows an override of the default reason string in a reject page.
rejectedheaders	if Yxorp rejects a header (because it does not know this header, it is overlength, contains illegal characters) the header name is added to this variable.
_pattern	Set to the last executed regular expression pattern within the context of whitelist and blacklist statements. Not set for normal regular expressions.
_[0-9]	Set after the execution of a regular expression, depending on the pattern. Only the variables needed to capture the data extraction from the pattern are updated; other variables are untouched. Note that this functionality is only available if you have PCRE included in the build.

Rule syntax

This section is a reference of the rule program syntax.

Program

program: statements

A program is the entirety of the source code in a single rule. A program (i.e. rule) is comprised of one or more statements.

Since version 2.27, a rule may be empty (i.e. contain no statements).

Block

block: { statements }

A block is used to group one or more statements. As in many other languages, this is most often used together with other statements, like the if statement.

Statement

statement: if (expression) statement 
or         if (expression) statement else statement 
or         foreach identifier (expression) statement 
or         whitelist identifier { list-elements } if-failed statement 
or         blacklist identifier { list-elements } if-failed statement 
or         expression 
or         block
or         return

There are several statements, as listed here. Note that a block can contain statements; by extension, a statement that contains another statement can also contain a block, and thus more statements.

If

if:        if (expression) statement
or         if (expression) statement else statement

The if statement works exactly as you would expect. Note that the expression is expected to result in a truth value. In rules, true is anything that is not an empty string; false is an empty string.

The simple if statement (without else) can also be written as follows:

if-shorthand: '?' expression statement

Foreach

foreach: foreach identifier (expression) statement

The foreach statement implements loops. The expression in the statement is expected to result in a space separated list. The foreach statement loops once for each element in the list, setting the variable named in the identifier to the list element.

The foreach statement is supported by several function calls, that deliver lists of variable names. In conjunction with the indirection operator “$”, this can be used to perform an operation on each element in a list of variables.

Whitelist

whitelist:        whitelist identifier { list-elements } if-failed statement

list-elements:    -/regexp/                     // simple, case insensitive
or                ~/regexp/                     // simple, case sensitive
or                -/regexp/ : list-statement    // complex, case insensitive
or                ~/regexp/ : list-statement    // complex, case sensitive

list-statement:   statement + continue-list

The whitelist statement is used to easily check the value of a variable against a set of regular expressions. If a match is found, the rest of the whitelist statement is skipped. If no match is found, the statement following the if-failed keyword is executed.

All forms of list-elements may be used in the same whitelist. There are no syntax restrictions to code formatting, but normal practice is to write each regular expression on its own line.

If a regular expression is in the complex form, the statement following the regular expression is executed before the normal whitelist action. In this context, a special statement is available with the continue-list keyword. If the continue-list statement is executed, the whitelist is resumed as if no match had occurred; this can be used to handle exception cases.

Normally, the if-failed clause is used to stop execution, reject a request, etc; however, note that this must be done explicitly, since the rule program language does not execute a default action.

After the whitelist completes, the special variable _pattern is set to the last executed pattern in a whitelist. This can be used to determine which pattern in the list matched. If whitelists are nested, the fact that _pattern contains the last executed pattern, not the successful match, may produce unexpected results.

Note that the scope in which _pattern is valid is different in a whitelist and a blacklist.

Blacklist

blacklist:        blacklist identifier { list-elements } if-failed statement

list-elements:    -/regexp/                     // simple, case insensitive
or                ~/regexp/                     // simple, case sensitive
or                -/regexp/ : list-statement    // complex, case insensitive
or                ~/regexp/ : list-statement    // complex, case sensitive

list-statement:   statement + continue-list

As with the whitelist statement, the blacklist statement is used to easily check the value of a variable against a set of regular expressions. If a match is found, the statement following the if-failed keyword is executed. If no match occurs, the if-failed clause is not executed.

All forms of list-elements may be used in the same blacklist. There are no syntax restrictions to code formatting, but normal practice is to write each regular expression on its own line.

If a regular expression is in the complex form, the statement following the regular expression is executed before the normal blacklist action. In this context, a special statement is available with the continue-list keyword. If the continue-list statement is executed, the blacklist resumed as if no match had occurred; this can be used to handle exception cases.

Normally, the if-failed clause is used to stop execution, reject a request, etc; however, note that this must be done explicitly, since the rule program language does not execute a default action.

After the blacklist executes, and within the context of the if-failed clause, the _pattern special variable contains the pattern that last executed. If blacklists are nested, the fact that _pattern contains the last executed pattern, not the successful match, may produced unexpected results.

Note that the scope in which _pattern is valid is different in a whitelist and a blacklist.

Return

return:    return
or         return (expression)

The return statement ends the execution of the current rule program. Either version can be used, but the value of the expression in a return statement is currently not used. This may change in future versions.

Expression

The forms that an expression may have are listed below:

expression:   call(parameters)           // function call
or            literal                    // value
or            number                     // integral number value
or            identifier = expression    // assignment
or            identifier                 // variable
or            $identifier                // indirect variable
or            ! expression               // logical not
or            expression && expression   // logical and
or            expression || expression   // logical or
or            expression + expression    // numerical addition
or            expression – expression    // numerical subtraction
or            expression * expression    // numerical multiplication
or            expression / expression    // numerical division
or            expression == expression   // numerical compare equal
or            expression != expression   // numerical compare not equal
or            expression < expression    // numerical compare less
or            expression > expression    // numerical compare greater
or            expression <= expression   // numerical compare less or equal
or            expression >= expression   // numerical compare greater or equal
or            (expression)               // order expression priority
or            identifier -/regexp/       // case insensitive regexp
or            identifier ~/regexp/       // case sensitive regexp

Most expression forms are very common, and will not be detailed further.

Note that the numerical operations are only valid if the operands are actually numbers.

Identifier

identifier: letter { letter | digit | “_” | “:” }

Identifiers are used for variable names. They must start with a letter, and may contain digits, underscores, and colons. Colons should only be used at the end, and signify that the identifier points to a variable that is used to store the contents of a header.

Internally, identifiers are also used for function names, and will be used in future versions to refer to other rules.

Numbers

number: digit { digit }

Numbers are formed as one or more digits. Decimal points (i.e. floating point numbers) are not supported. Negative numbers are also not supported.

In the internal representation, all numbers are stored as strings. There is no difference between 0 and “0” in the rule syntax.

Literals

literal: '”' { character } '”'

Literals are values that are expressed directly in the source, as in “index.html”. Think of them as strings.

Indirect variable

indirect variable : $identifier

An indirect variable contains the name of another variable; this variable's value is the value of the expression form listed here. If the indirect variable is used without the indirection operator '$', the variable itself is addressed; if the indirection operator is used, the value of the variable points to another variable.

Regular expressions

regular expression: identifier -/regexp/       // case insensitive regexp
or                  identifier ~/regexp/       // case sensitive regexp

Regular expressions are dependent on the regular expression library you included (libc or PCRE). With the posix-libc variant, only basic regexps are available; see your man page for regcomp, regexec, or regex for a description of what you can do if you have this library. Yxorp uses this variant only if PCRE could not be found on your system during configuration. PCRE is highly recommended, as it has much more functionality, and more consistent over different types of system. If you are not sure which library you have, check with yxorp -V how your build is configured.

If you included PCRE, you can use most constructs that are possible in Perl. Extraction of matched data from the original string is supported using the _0, _1, up to _9 special variables. Note that as in Perl, only the variables that were actually necessary for storing matched data are updated.

Both case-sensitive and case-insensitive variants of the regexp calls are available. Respectively use a tilde '~' or a minus '-' to specify which you want to use. The regexp itself must be enclosed in slashes. There is no implicit string begin or end added to the regexp; if you want to match against the start or end of a string, use ^ and $, respectively.

Function reference

basicauth_add(source, realm) - cache basicauth credentials

The basicauth_add function adds the basicauth credentials, presented in the Authorization: header present in the current request, to the basicauth cache table. The entry will be valid for a limited time only, and after it has become invalid it will be removed from the cache table by the basicauth maintenance process.

The time the cached entry is valid is defined in the globalconfiguration item basicauthcachetime.

basicauth_check(source, realm) - check basicauth credentials

The basicauth_check function checks the credentials presented in the Authorization: header with the current request with the basicauth table. If a match is found, either as a hard coded entry or a cached entry, the function returns true; else, it returns false.

basicauth_getpass() - read password from basicauth credentials

The basicauth_getpass function reads the password from the base64-encoded basic authentication credentials passed with the current request.

basicauth_getuser() - read userid from basicauth credentials

The basicauth_getuser function reads the userid from the base64-encoded basic authentication credentials passed with the current request.

basicauth_reject(realm) - cause a 401 status code

The basicauth_reject function sets up the response for the current request to contain a 401 Unauthorized status code, including a WWW-Authenticate: header containing the realm as set as a parameter to the function. Also, the reject reason is set to the text “basicauth401”.

Please note that it is generally required to explicitly end the rule execution after calling this function, for instance by using the return statement. Failure to do so may have unintended consequences if the status code, reject reason, or WWW-Authenticate header are changed by further statements.

clearclientstate() – remove the client state

Synonym for killclientstate(). This function immediately removes all state information that is kept in the clientstate table for the current request. If a state cookie was previously sent to the client, no attempt is made to remove that state cookie from the client. A next request from this client will either carry a state cookie value that does not map to a valid state or no state cookie at all, and in both cases cause a new state to be created.

In many cases it is preferable to not immediately remove the state information. This situation would for instance occur while running a rule for an URI for a logoff page, which refers to other entities like graphics that still have to be retrieved. If in such a case the state information was immediately removed, retrieving the graphics entities would cause a new state entry to be created in the client state table. For this reason, there is also the setclientstatefastage() function.

clearsticky() – remove a sticky mapping from the client state

Synonym for killsticky(). This function removes the sticky mapping for the current value of the Host: header (as exists at the time of the call, and possibly changed by assignments to the Host: variable in this or previously run rules).

clientinrange(range) – check if client IP is in a range

clientinrange checks if the ip address that the client uses on this connection is part of the IPv4 range that is passed. If the client address or the range can not be parsed, false will be returned. Note that the client address can not be parsed if it is in IPv6 format, as would happen if the request comes in from an IPv6 listener. The range must be in the format a.b.c.d/x, where 1<=x<=32.

Typical use is as follows:

// check where the request comes from
if (clientinrange(127.0.0.1/32) {
   // allow some things 
} else if (clientinrange(192.0.2.0/24) {
   // allow some other things
} else {
   // don't allow things
   reject(“sorry...”);
}

clientinip6range(range) – check if client IP is in an IPv6 range

Similar to clientinrange, but for IPv6 addresses. The same limitations apply; IPv4 clients coming in through an IPv4 mode listener can not be correctly processed by this function.

The range can be specified in the following formats:

x:x:x:x:x:x:x:x/y            // default
x::x/y                       // missing parts are set to zeros
::a.b.c.d/y                  // deprecated transitional form
::ffff:a.b.c.d/y             // ipv4 mapped ipv6 address
::1/y                        // localhost

x = 4-digit hexadecimal
y = decimal, 1<=y<=128
a.b.c.d = ipv4 address range

clientstate(type) – retrieve value from client state

clientstate retrieves values from the client state, depending on the string passed to it:

“toclient”: the number of bytes sent to the client on this clientstate (cumulative)

“toserver”: the number of bytes sent to the server on this clientstate (cumulative)

“fromclient”: the number of bytes received from the client on this clientstate (cumulative)

“fromserver”: the number of bytes received from the server on this clientstate (cumulative)

“hitcount”: the number of requests processed on this clientstate

“id”: the cookie value for this clientstate

concat(...) – concatenate

concat takes a variable number of arguments and concatenates them. The concatenated string is returned.

contains(haystack, needle) – test if needle contains haystack

contains is equal to the C function strstr. It takes two arguments; the first is the haystack; the second is the needle. It returns a true if the needle is found in the haystack; false otherwise. The comparison is case sensitive.

Note you can also use a regular expression; this is a lot more flexible. The contains function will be deprecated in some future version.

contains_characters(string, list) – test if string contains characters in list

contains_characters takes two arguments; the first is a string, which is tested against a list (which is also a string, by the way). If any of the characters in the list occur in the string, true is returned; false otherwise. The comparison between characters is case sensitive.

Note you can also use a regular expression; this is a lot more flexible. The contains_characters function will be deprecated in some future version.

containscase(haystack, needle) – test if needle contains haystack

containscase is equal to the C function strcasestr (if that exists on your platform). It takes two arguments; the first is the haystack; the second is the needle. It returns true if the needle is found in the haystack; false otherwise. The comparison is case insensitive.

Note you can also use a regular expression; this is a lot more flexible. The containscase function will be deprecated in some future version.

digest_auth_check(realm, authentication source) – enforce digest authentication

checks if valid digest authentication credentials are present in the request (i.e. the Authorization: Digest header). If this header is not found, the request is rejected with a 401 status code (normally causing a browser to show the userid/password dialog). The realm (text) is shown in this dialog window.

If the authentication credentials are present in the request, and if the authentication source parameter has the exact value of “local”, the request credentials will be checked to the internal digestauth table (see the decription of the <digestauth> table in the configuration reference). The internal table uses the realm name to enable different realms to use different userid/password combinations, or to enable a given user to be granted access to one resource, but not another.

Versions earlier than 2.33 contained an experimental ldap interface associated with this function call; this has been removed.

If the authentication was successful, true is returned; false otherwise (and the request rejected). Note that execution of the rule program does not stop if the digest_auth_check fails; this must be done explicitly; normally, a return statement should be used to prevent execution of statements following the call to digest_auth_check. Typical use is as:

// check if digest authentication credentials are set
if (!digest_auth_check(“my-realm”, “local”)) {
   return;          // end the rule, so that the implicit reject 
                    // from basic_auth_check is processed
}
// reach here if digest_auth_check was successful

enumerate_dupvar(variablename) – enumerate duplicate variables

enumerate_dupvar takes, as its only argument, a string containing the name of a variable. It returns a space separated string containing the real variable names of all variables in a duplicate variable set. If the variable is singular (i.e. there are no other members in the duplicate variable set) only the variable's name is returned.

enumerate_dupvar is especially useful when processing a header for which duplicates may exist (i.e. multiple headers with the same name), as in the following example:

// the following is wrong:
if (Cookie: -/blabla/) {
   ... only the first Cookie header is checked
}
// since headers that may occur multiple times are stored in a dupvar group,
// as Cookie:, Cookie:0, Cookie:1 etc.

// check for all cookie headers
foreach i (enumerate_dupvar("Cookie:")) {
   if ($i -/blabla/) {
      ... the cookie contains “blabla”
   }
}

enumerate_reqhdr() – enumerate request header variables

enumerate_reqhdr takes no arguments. It returns a space separated string containing the variable names of all variables that have the REQHDR attribute set. Yxorp sets this attribute for all variables it creates to represent request headers.

enumerate_rsphdr() – enumerate response header variables

enumerate_rsphdr takes no arguments. It returns a space separated string containing the variable names of all variables that have the RSPHDR attribute set. Yxorp sets this attribute for all variables it creates to represent response headers.

equal(s, r) – compare two variables

equal performs a case sensitive comparison of two variables.

Note you can also use a regular expression; this is a lot more flexible. The containscase function will be deprecated in some future version.

equalcase(s, r) – case insensitive compare of two variables

equal performs a case insensitive comparison of two variables.

Note you can also use a regular expression; this is a lot more flexible. The containscase function will be deprecated in some future version.

getattr(variablename, attr) – test attributes of a variable

getattr tests a variable for a specific attribute. The variable name must be set as the first parameter; the exact attribute as the second. If the attribute is present, true will be returned; if not, false will be returned. Currently two values for attr are recognized: “reqhdr” to test if a variable has been created by Yxorp as a request header variable, and “rsphdr”, for a response header variable.

getcachearea() – return cache area name

Returns the cache area configured for this request.

getclientip() – return the client ip address

getclientip returns the ip address (IPv4 or IPv6) of the client associated with the current request.

The value of the returned ip address may have been changed as a result of processing an X-Forwarded-For header, if Yxorp is configured to do so. See the chapter on XFF processing for details.

getclientcipherbits() – return the key length used in the current SSL connection

getclientcipherbits returns the key length used in the current SSL connection between a client and Yxorp.

If the current connection is not an SSL connection, or Yxorp is built without SSL support, false is returned.

getclientciphername() – return the cipher used in the current SSL connection

getclientciphername returns the name of the cipher used in the current SSL connection between a client and Yxorp.

If the current connection is not an SSL connection, or Yxorp is built without SSL support, false is returned.

getclientdomainname() – get the client domain name

getclientip returns the domain name of the client associated with the current request. Note that using this function causes a domain name lookup to be done; this may impact performance, especially when Yxorp is handling a large volume of requests.

getclientcertfailcode() – return the client certificate failure code

getclientcertfailcode returns the code, as defined by OpenSSL, that identifies the failure that occurred while verifying the client certificate presented on the current SSL connection between a client and Yxorp. The code is obtained from the OpenSSL function SSL_get_verify_result.

If the current connection is not an SSL connection, or Yxorp is built without SSL support, false is returned.

getclientcertfailmsg() – return the client certificate failure message

getclientcertfailcode returns the message, as produced by OpenSSL, that identifies the failure that occurred while verifying the client certificate presented on the current SSL connection between a client and Yxorp. The message is obtained from the OpenSSL function X509_verify_cert_error_string.

If the current connection is not an SSL connection, or Yxorp is built without SSL support, false is returned.

getclientcertinfo() – return client certificate information

getclientcertinfo returns information about the client certificate presented on the current SSL connection between a client and Yxorp. The information is in the form returned by the OpenSSL call X509_NAME_oneline.

If the current connection is not an SSL connection, or Yxorp is built without SSL support, false is returned.

getclientcertissuerinfo() – return client certificate issuer information

getclientcertissuerinfo returns information about the issuer of the client certificate presented on the current SSL connection between a client and Yxorp. The information is in the form returned by the OpenSSL call X509_NAME_oneline.

If the current connection is not an SSL connection, or Yxorp is built without SSL support, false is returned.

getclientstatedvar(variablename) – read a variable from the client state

getclientstatedvar reads a variable from the client state, if one exists for this request. The variable name must be passed as a string. If the variable and the client state exist, the data will be returned; otherwise, an empty string (i.e. false) will be returned.

getentity() – return the current entity

getentity returns the current entity. If there is none, false is returned. Normally, this function can only be used in inboundentity or outboundentity rules.

Note that the rule vm, as it works on zero-terminated strings, cannot currently process binary entities. Thus, if you are processing a request that carries a binary entity (ie. a picture, jpeg, bitmap or other) the entity will be returned by this function, but following statements can not process it completely.

getinboundentityrule() – return inboundentity rule

getinboundentity returns the name of the inboundentity rule. This attribute normally set by the listener definition, and is reset to that value on session reuse; it may have been changed by a call to setinboundentityrule() executed in the context of the current request.

getlength(variablename) – return the length of a variable

getlength returns the length of a variable. The name of the variable is passed to getlength.

getlistenerid() – return the listener id

getlistenerid returns the id (i.e. name) of the listener that has received the request that is currently being processed.

getmaxcount(variablename) – return the number of variables in a dupvar group

getmaxcount returns the number of variables in a duplicate variable group

getmaxlength(variablename) – return the maxlength attribute of a variable

getmaxlength returns the maxlength attribute of a variable.

getorder(variablename) – return the order attribute of a variable

getorder returns the order attribute of a variable.

getoriginalname(variablename) – return the originalname attribute of a variable

getorder returns the originalname attribute of a variable.

getoutboundentityrule() – return outbound entity rule

Returns the name of the outbound entity rule. This attribute normally set by the listener definition, and is reset to that value on session reuse; it may have been changed by a call to setoutboundentityrule() executed in the context of the current request.

getrejectrule() – return reject rule

Returns the name of the reject rule. This attribute normally set by the listener definition, and is reset to that value on session reuse; it may have been changed by a call to setrejectrule() executed in the context of the current request.

getrequestnumber() – returns the ordinal number of this request

getrequestnumber returns the ordinal number of this request, since the start of Yxorp.

getresponserule() – return response rule

Returns the name of the response rule. This attribute normally set by the listener definition, and is reset to that value on session reuse; it may have been changed by a call to setresponserule() executed in the context of the current request.

getservercertfailcode() – return the server certificate failure code

getservercertfailcode returns the code, as defined by OpenSSL, that identifies the failure that occurred while verifying the server certificate presented on the current SSL connection between Yxorp and a (real) server. The code is obtained from the OpenSSL function SSL_get_verify_result.

If the current connection is not an SSL connection, or Yxorp is built without SSL support, false is returned.

getservercertfailmsg() – return the server certificate failure message

getservercertfailcode returns the message, as produced by OpenSSL, that identifies the failure that occurred while verifying the server certificate presented on the current SSL connection between Yxorp and a (real) server. The message is obtained from the OpenSSL function X509_verify_cert_error_string.

If the current connection is not an SSL connection, or Yxorp is built without SSL support, false is returned.

getservercertinfo() – return server certificate information

getservercertinfo returns information about the server certificate presented on the current SSL connection between Yxorp and a (real) server. The information is in the form returned by the OpenSSL call X509_NAME_oneline.

If the current connection is not an SSL connection, or Yxorp is built without SSL support, false is returned.

getservercertissuerinfo() – return server certificate issuer information

getservercertissuerinfo returns information about the issuer of the server certificate presented on the current SSL connection between Yxorp and a (real) server. The information is in the form returned by the OpenSSL call X509_NAME_oneline.

If the current connection is not an SSL connection, or Yxorp is built without SSL support, false is returned.

getservercipherbits() – return the key length used in the current SSL connection

getservercipherbits returns the key length used in the current SSL connection between Yxorp and a (real) server.

If the current connection is not an SSL connection, or Yxorp is built without SSL support, false is returned.

getserverciphername() – return the cipher used in the current SSL connection

getserverciphername returns the name of the cipher used in the current SSL connection between Yxorp and a (real) server.

If the current connection is not an SSL connection, or Yxorp is built without SSL support, false is returned.

getserverturnaround() – return the turnaround time for the server request

getserverturnaround returns the time, in milliseconds, that it took from sending the request to the server, to the response header being completely received. This gives an indication of server response; note that this time is however in some cases quite different from the response time an end user may experience. This difference is at least explained by the network time between Yxorp and the client, but also included is the time it takes the client to render a page.

getsslbackend() – return the sslbackend flag

getsslbackend returns the value of the sslbackend flag. If set, this means Yxorp will use SSL to the server; if not set, Yxorp will use plain HTTP.

getsticky() – return the sticky flag

getsticky returns the value of the sticky flag.

isccertverified() – return verification state of client certificate

isccertverified() returns true if the certificate used on the SSL session between Yxorp and a (real) server could be verified. If the session does not use SSL, or if Yxorp was built without SSL support, false is returned.

isscertverified() – return verification state of server certificate

isscertverified() returns true if the certificate used on the SSL session between a client and Yxorp could be verified. If the session does not use SSL, or if Yxorp was built without SSL support, false is returned.

issslsession() – return the SSL state of the client session

issslsession returns true if the client session uses SSL; false otherwise.

killclientstate() – remove the client state

Synonym for clearclientstate().

killsticky() – remove a sticky mapping from the client state

Synonym for clearsticky().

killvar(variablename) – remove a variable

Synonym for killvariable()

killvariable(variablename) – remove a variable

killvariable kills (removes) the variable of which the name is passed as a parameter from the variable pool. This is especially useful if you want to keep a header from being passed on.

// Don't send the Server: header at all
killvariable(“Server:”);

// Don't send any Cookie: headers
foreach i (enumerate_dupvar("Cookie:")) {
   killvariable($i);
}

ldap_bind(dn, pass)

ldap_bind will execute a bind operation to the ldap server set by a previous ldap_set_serveruri function, to the dn that is passed and using the pass parameter.

ldap_init()

ldap_init causes the internal structures required for interaction with an ldap server to be initialized, and must be called before any ldap library function is invoked through any of the ldap functions other than ldap_set_<whatever>. Although this function may become redundant in a version after Yxorp 2.33, it is good practice to call this function before invoking ldap_search, ldap_bind or any other ldap action function.

ldap_search(filter )

Initiates a seach operation towards the ldap server that has been identified with the ldap_set_serveruri function, respecting the parameters set by the ldap_set_searchbase, ldap_set_searchdn, ldap_set_searchpw, ldap_set_searchscope functions. These parameters will be used to do a bind to the ldap server (a so called 'administrative bind'); the connection that is thus created will be used to do the search operation. The search operation may return one or more dn that match the search criteria set both by the filter and by the ldap_set_<x> functions. These dn will be set in dupvar variables with the basename of “dn”; the ldap_search function returns the number of dupvars that have been set in this way. Normally, for a successful implementation, the number of dn's returned should be exactly one if the user is found or zero if not; all other values (as in, multiple definitions in the ldap server could match to this user definition) are not generally useable.

The filter parameter is a string representation of the filter to apply in the search. The string should conform to the format specified in RFC 4515, as extended by RFC 4526. For instance, "(cn=Jane Doe)".

ldap_set_seachbase(base)

Sets the base dn that will be used in search operations initiated by the ldap_search call following the use of this function.

ldap_set_searchdn(dn)

Sets the dn that will be used in search operations initiated by the ldap_search call following the use of this function.

ldap_set_searchpw(pw)

Sets the password that will be used in search operations initiated by the ldap_search call following the use of this function.

ldap_set_searchscope(scope)

This function sets the seach scope parameter into the ldap structure for use in an ldap_seach function. The scope parameter should be one of the string values LDAP_SCOPE_BASE, to search the object itself, LDAP_SCOPE_ONELEVEL, to search the object’s immediate children, LDAP_SCOPE_SUBTREE, to search the object and all its descendants, or LDAP_SCOPE_CHILDREN, to search all of the descendants. Note that the latter requires that the server support the LDAP Subordinates Search Scope extension. If the string value does not match one of the values listed here, exactly and case correct, then the default of LDAP_SCOPE_SUBTREE will be applied.

See the documentation of the ldap client library and ldap server for further details.

ldap_set_serveruri(uri)

This function sets the uri to be used in a subsequent ldap interaction. The format of the uri is as described in the documentation of the ldap_initialize function. The value presented here will be passed to the ldap_initialize function in the ldap client library; see the documentation of the ldap client library for further details.

redirect(url) – redirect request

redirects the request, by rejecting it with a statuscode 307 (Temporary Redirect), passing the URL in a Location: header. This causes a browser to redirect to this URL.

rdlog(numerical value) – set rdlog master flag for the current request

Calling rdlog() with a non-zero value causes the master rdlog flag to be set for this request. If zero, no request detail log actions will be taken for this request. The effect varies according to in which type of rule the call is executed.

rdlogfinalinternaldata(numerical value) – include dump of internal structures

Calling rdlogfinalinternaldata() with a non-zero value causes Yxorp's internal data structures for this request to be logged at the end of the request. The status of the flag is examined at the end of the request.

rdlogerror(numerical value) – set rdlog error flag for the current request

Calling rdlogerror() with a non-zero value causes the rdlog error flag to be set for this request. If zero, the request detail log actions will be taken for this request, irrespective of the result of the request. If non-zero, only requests ending in an error state will be logged to the rdlog. The status of the flag is examined at the end of the request.

rdlogreceivedrequest(numerical value) – include dump of received request

Calling rdlogreceivedrequest() with a non-zero value causes the data that was received from the client to be logged. The status of the flag is examined during the request phase, but (currently) before any rule executes. Hence, this function makes no sense (yet).

The value passed on this function is a bitmask. See the description of <requestdetaillog> for the meaning of the bits.

rdlogreceivedresponse(numerical value) – include dump of received response

Calling rdlogreceivedresponse() with a non-zero value causes the data that was received from the server to be logged. The status of the flag is examined during the response phase.

The value passed on this function is a bitmask. See the description of <requestdetaillog> for the meaning of the bits.

rdlogtransmittedrequest(numerical value) – include dump of transmitted request

Calling rdlogtransmittedrequest() with a non-zero value causes the data that was sent to the server to be logged. The status of the flag is examined during the request phase.

The value passed on this function is a bitmask. See the description of <requestdetaillog> for the meaning of the bits.

rdlogtransmittedresponse(numerical value) – include dump of transmitted response

Calling rdlogtransmittedresponse() with a non-zero value causes the data that was sent to the server to be logged. The status of the flag is examined during the response phase.

The value passed on this function is a bitmask. See the description of <requestdetaillog> for the meaning of the bits.

reject(reason) – reject request

the request is rejected, with the specified reason. The reason string is also logged to the error log.

sanitizexforwardedfor(trusted proxy IP ranges) – check a X-Forwarded-For: header

sanitizexforwardedfor scans a received X-Forwarded-For header to verify that all proxy addresses are in the list of IP ranges passed as arguments. A variable number of arguments can be passed.

See the chapter on XFF processing for details.

setattr(variablename, type) – set variable attribute

Sets the attribute on the specified variable. The following attribute values can be set:

RSPHDR: this variable is part of the response header group.

REQHDR: this variable is part of the request header group.

REJHDR: this variable is part of the reject header group.

setcachearea(cachearea) – set cache area name

Sets the cache area to use for this request.

setclientipfromxff() – set client ip from X-Forwarded-For: header

If a X-Forwarded-For: header exists, and it is sanitized by calling one of the sanitizexforwardedfor**() functions, calling setclientipfromxff() will have the effect that yxorp's administration of the client ip address will be changed to the client address taken from the X-Forwarded-For: header. For instance, if you call setclientipfromxff(), then getclientip() or one of the clientiprange**() functions, then these functions will use the client address from the X-Forwarded-For: header.

setclientstatedvar(variablename, value) – set a variable in the client state

setclientstatedvar sets a variable from the client state, if one exists for this request. The variable name must be set as the first parameter; its value as the second. Note that client states are created after a request type rule runs, so on the first request from a client, the client state is not yet available. Also note that the number of slots in the client state dvar table is normally very limited (but it can be increased in the global configuration).

setclientstatefastage(seconds) – schedule removal of the client state

After the call, the client state will remain active for at least the specified number of seconds. After that time expires, the client state will be cleared by the first client state cleanup run; this cleanup run may occur anywhere between immediately after the seconds expire, and the clientstatecleanupinterval set in the globalconfiguration.

Unless you have clients on very slow links or very complicated logoff pages, 10 seconds should be a reasonable setting for the above scenario.

setclientstateid(id) – manually set clientstate id for this request

When a client state is needed, normally Yxorp will automatically create a client state id, and use a cookie to store the value of the client state id in the client browser. If this is not possible, for instance because the client has cookies disabled, it is possible to use some other attribute of the request as a client state id.

When manually maintaining the client state, the setclientstateid function must be called in a request rule for all requests that need access to the client state (for instance when using the client state for sticky load balancing, this means all all requests). The client state id passed to the functions must be the same exact string for all cases that need mapping to the same client state. Care must be taken that different clients will not use the same state; otherwise, unpredictable results may occur in dealing with the client state.

When manually maintaining client states, the automatic creation of client states must be disabled. See setclientstateidgenerate on how to do this.

setclientstateidgenerate(value) – set automatic client state id generation

When a client state is needed, normally Yxorp will automatically create a client state id, and use a cookie to store the value of the client state id in the client browser. If this must be prevented, for instance because the setclientstateid function is used to manually create the client state id, the setclientstateidgenerate function must be called with a zero argument. All non-zero arguments will result in the default behavior where Yxorp will generate the client state id.

setconnectservertimeout(timeout) – set serverend timeout value

Sets the timeout (in milliseconds) that is used for connecting to a server. The value is valid only during the processing of the request that is being executed. The default value, that applies if no call to this function is executed, is taken from <globalconfiguration>

setcookiedomain(string) – set cookie domain

This function sets the domain part of the state cookie that Yxorp uses for tracking or sticky load balancing. If no domain is set, the generally accepted behavior of clients is to only send back the cookie values to the same domain that set the cookie. If you use the domain part, you can cause the cookie values to be sent to several hostnames in a domain group (i.e. set the domain to .example.org to have the cookie sent to www.example.org, www2.example.org, etc).

setdup(variablename, value) – set duplicate variable

Sets the next instance of the duplicate variable to the specified value.

If, for example, a dupvar exists:

ex=an
ex1=example

the call setdup(“ex”, “test”) would have the following result:

ex=an
ex1=example
ex2=test

setmaxlength(variablename, length) – set variable max length attribute

Sets the maximum length attribute on the specified variable (after this, setting a longer value in a variable results in the value being truncated).

setinboundentityrule(rulename) – set inbound entity rule

Sets the outbound entity rule to run. This attribute normally set by the listener definition, and is reset on session reuse.

setorder(variablename, ordinal) – set variable order attribute

Sets the order attribute on the specified variable.

setoriginalname(variablename, originalname) – set variable originalname attribute

Sets the originalname attribute on the specified variable.

setoutboundentityrule(rulename) – set outbound entity rule

Sets the outbound entity rule to run. This attribute normally set by the listener definition, and is reset on session reuse.

setreadfromservertimeout(timeout) – set serverend timeout value

Sets the timeout (in milliseconds) that is used for reading from a server. The timeout applies to each read operation, not to the processing of the entire response; the server has to send at least some data within the timeout. The value is valid only during the processing of the request that is being executed. The default value, that applies if no call to this function is executed, is taken from <globalconfiguration>

setresponserule(rulename) – set response rule

Sets the response rule to run. This attribute normally set by the listener definition, and is reset on session reuse.

setrejectrule(rulename) – set reject rule to run

Sets the reject rule to run. This attribute normally set by the listener definition, and is reset on session reuse.

setsendxforwardedfor() – send X-Forwarded-For: header to server

Calling setsendxforwardedfor() in a request rule causes Yxorp to send a X-Forwarded-For: header to a server. If the header is considered sanitized (i.e. after calling the sanitizexforwardedfor() function) the sanitized part of the header sent in from a client will be included, and Yxorp will append the IP address of where it received the request from to the header.

If no X-Forwarded-For was sent in from the client, if X-Forwarded-For was disabled as a header in Yxorp's header table, or if no call to the sanitizexforwardedfor() function was executed, Yxorp will send a X-Forwarded-For containing the IP address of the client connection.

See the chapter on XFF processing for details.

setsslbackend() – set ssl backend

Forces the server session for this request to use SSL.

setsslserverconnecttimeout(timeout) – set serverend ssl timer

This function sets the timer that is used when connecting to a server. The entire connect operation, including SSL handshake, certificate validation etc. must complete within this time. The timeout is specified in milliseconds, and valid only during the processing of the request that is being executed. The default value, that applies if no call to this function is executed, is taken from <globalconfiguration>.

setsslserverreadtimeout(timeout) – set serverend ssl timer

This function sets the timer that is used when reading from a server. The timeout is specified in milliseconds, and valid only during the processing of the request that is being executed. The default value, that applies if no call to this function is executed, is taken from <globalconfiguration>.

setsslserverwritetimeout(timeout) – set serverend ssl timer

This function sets the timer that is used when writing to a server. The timeout is specified in milliseconds, and valid only during the processing of the request that is being executed. The default value, that applies if no call to this function is executed, is taken from <globalconfiguration>.

setsslserverclosetimeout(timeout) – set serverend ssl timer

This function sets the timer that is used when closing a server connection. The timeout is specified in milliseconds, and valid only during the processing of the request that is being executed. The default value, that applies if no call to this function is executed, is taken from <globalconfiguration>.

setsticky() – set sticky flag

Sets the sticky flag for this request, causing sticky scheduling and client tracking for this request.

settargetserver(id) – set target server

Normally Yxorp uses the value of the Host: header to determine which server to send a request to. If however the settargetserver() function is used, Yxorp will send a request to the server whose id is passed as a parameter. This can be useful if a large number of virtual hosts is defined on a group of real servers, for which Yxorp is to do load balancing; in this case, Yxorp does not need to be aware of each individual virtual host domain.

setwritetoservertimeout(timeout) – set serverend timeout value

Sets the timeout (in milliseconds) that is used for writing to a server. The timeout applies to each write operation, not to the processing of the entire request. The value is valid only during the processing of the request that is being executed. The default value, that applies if no call to this function is executed, is taken from <globalconfiguration>

statuscodemsg(code) – returns status code textual message

The text message for the HTTP response code, as defined in the global configuration, is returned.

strremove(haystack, needle) – remove matching string

The strremove function returns a copy of the string “haystack” from which the first occurrence of the string “needle” has been removed. If “haystack” does not contain any occurrence of the string “needle”, the entire “haystack” is returned.

The strremove function is especially useful in conjunction with regular expressions, as in the following example:

a="aapnootmies";
a -/(aap)(noot)(mies)/;

tmp1=strremove(a, _1);
// resulting value of tmp1 is “nootmies”

trace(message) – send a message to the trace log

The message is written to the trace log. If no trace log is active, the message is discarded.

unsetsslbackend() – clear ssl backend

Stops forcing the server session for this request to use SSL.

unsetsticky() – clear sticky flag

Clears the sticky flag for this request, disabling sticky scheduling and client tracking for this request.

writetofile(filename, value) – write to a file

writetofile takes the value of the second argument, and writes that to a file. The name of the file is set to the value of the first argument.

As this is one of the very few reasons for Yxorp to access the file system and thus poses security risks, this function may be removed for security reasons. If you do not want this to happen, inform the authors.

writeentitytofile(filename) – writes entity to a file

writeentitytofile writes the current entity (if any) to a file. The name of the file is set to the value of the first argument.

yesno(x) – return truth value as yes or no

yesno takes the truth value from the argument and returns yes (=true) or no(=false).

The virtual machine

The code generated from the rule program sources is compiled, and executed by a virtual machine. The virtual machine implements a stack machine (like a RPN desk calculator) working on strings.

In general, the virtual machine is very fast - it is designed and tuned to process strings quickly, and it does. The instructions that the virtual machine processes are almost all very simple and straightforward. Even though the rule compiler is simple and straightforward, optimizations are automatically done for both trivial and expensive virtual machine instructions. Most data manipulations are accomplished just by moving pointers around. Temporary variables are allocated in scratch memory pools; this speeds up execution because memory allocation is done few times if not just once; also this ensures that all the used pointers will be valid for the duration of the VM run, even if memory shortages occur.

There is, of course, also a weak point: the virtual machine is not especially good at arithmetic, or at handling data that is poorly expressed as strings. Calculations require several conversions. Number handling is quite simplistic; if a string value is used in a numeric operation, the value will just default to 0 instead of causing an error (this behavior is likely to change in a future version). The value of false (i.e. the empty string) may be assumed to be treated as 0.

Also noteworthy, since the introduction of entity rules, is the fact that the virtual machine works on zero-terminated strings. Entities that contain binary zeros cannot be processed directly; specific function calls are necessary to overcome this weakness. These functions need by definition to be included as base code; it is not (yet) possible to define these without including source code.

The use of memory pools causes that there is a limit to the maximum size of an object. Currently, this is in the order of 4K bytes, which should not be a limitation for typical use. If your application needs larger objects, source-level tuning is necessary.

As all other Yxorp modules, the virtual machine contains a large amount of debugging hooks. If you build Yxorp with debugging enabled, performance will be somewhat impacted; however, only in very special cases you would have to worry about the performance penalty incurred with enabling debugging in the build. Unless you are running special hardware (embedded systems, router hardware, etc) or are running a very high volume site (as in, sustained throughput of over 100Mbit) you should by default include the debug code; normal current hardware will handle Yxorp's debugging code easily. You may have to think about dealing with the debugging output though - this will easily run into gigabytes of data.