The rule language uses a style similar to C. It works differently though. The language has its own data types and statement constructions that are specifically tailored to dealing with requests and responses in a proxy context.
Some of the most obvious peculiarities when compared to a 'normal' programming language are:
There is only one data type: a sequence of characters (a 'string').
Literals must be enclosed in double quotes, unless each character is a number. There is, however, no difference in how the literal is stored: Numbers are represented in exactly the same way as strings. The handling of numbers is not especially strong; Yxorp was not designed to do number crunching.
Truth values are represented by an empty string for false, and a non-empty string for true. The virtual machine and the built in functions tend to return true as a string containing “T”.
Variables are not declared; the parser will automatically declare variables 'on the fly'. In general, if a variable is referenced in any way, it is created. Reading a non-existent variable will result in an empty string.
Variable names are case insensitive.
Variable names may include a colon (':'). A variable name ending in a colon is used for variables containing header values; creating (i.e. referencing) a variable name ending in a colon causes Yxorp to create a HTTP header with the tag set to the variable name, and the value set to the variable's value.
Variables at runtime comprise the actual value and a number of attributes; these determine the actual header name, the order of the header in the list of headers to be sent, etc.
It is not possible to define functions. Function calls are available, but all called functions must be predefined, and their runtime must be included in the virtual machine.
Since version 0.21 it is possible to do a function-call to another rule, but this functionality is very limited. It is not (yet) possible to pass parameters, or retrieve a return value.
The language has the usual constructions for expressions and if-then-else statements. There is only one statement that causes a loop: the 'foreach' statements, that takes the value of an expression to control the loop. Specific to the task of processing requests and responses are the whitelist and blacklist statements.
A variable consists of the following items:
The variable name, as used in the program source
The current value
The 'original name'; this is the case-sensitive token used to generate the actual header name.
The original name is not in any way dependent on the variable name; however, the variables created by Yxorp for the RFC2616-defined headers are almost the same as the original names associated to the variables (an exception is the '-' character, which is replaced by '_').
The order number; this is used to specify the header order in a request or response. Yxorp sets the order number of the first header in a request to 10, incrementing by 10 for each next header.
The variable's actual length.
The variable's maximum length.
A flag indicating if this variable has been truncated by the maximum length setting.
A flag marking this variable as a request header, response header, or a header set for rejecting requests.
A count field; in case multiple instances ('duplicate dvars' or 'dupvar') of a variable exist, the base variable will have it's count field set to the highest sequence number of the set of multiple instances.
The first variable will have it's name set to the base name; subsequent variables will have a sequence number suffixed to the base name, where the sequence number starts at 1 for the first duplicate.
Normally, these variables are only referenced through a foreach statement.
All of these variables and variable attributes may be referenced or modified in a rule program; most attributes can be accessed through function calls.
Because the rule language is typeless, expressions must be written carefully. Since the only data type is a string, normal (numerical) comparison operators only work as intended if the data they operate on is exactly as expected. For instance,
i=”one”;
if (i==1) {
...
}will produce unexpected results, because the variable i can not be converted to a number. The conversion will produce a default value (in the current versions most likely 0, but this may change in some future version). The numerical == operator will use the converted value, and return false.
Operators are grouped in precedence order in the usual way. Parentheses can be used to make the ordering explicit, or to change the order. The following code snippets give some examples of how this works:
i=2*3+1; // i is set to 7
i=2*(3+1); // i is set to 8
if (errorcode==302 && uri -/index\.html/) // evaluate errocode==302,
// then evaluate uri -/index\.html/
// then do logical and The operator precedence is according to the following table:
|
Operator |
Description |
Associativity |
|---|---|---|
|
! |
logical not |
right |
|
-/.../ |
caseless regular expression |
left |
|
~/.../ |
casefull regular expression |
left |
|
*, / |
numerical multiplication, division |
left |
|
+, - |
numerical addition, subtraction |
left |
|
<, >, <=, >= |
numerical comparison |
left |
|
!=, == |
numerical comparison |
left |
|
&& |
logical and |
left |
|
|| |
logical or |
left |
|
= |
assignment |
left |
Some 'special' variables are used by Yxorp. These are:
|
Name |
Use |
|---|---|
|
uri |
generating requests. The uri actually sent to the server will be taken from this variable. In the response stage, changing will have no effect but the changed value will be logged. |
|
method |
set to the method name. Use for reference only; changing this variable will have no effect. |
|
rejectreason |
holds the text message explaining the reason that the Yxorp base code rejected the request. |
|
statuscode |
holds the status code reported by the server that processed the request |
|
errorcode |
if set, the error code from this variable will be reported to the client instead of the default (400 – Bad Request) in case of a rejected request |
|
errormessage |
if set, the error reason string from this variable will be reported to the client instead of the default (400 – Bad Request) in case of a rejected request. |
|
errorhtml |
if set, html code contained in this variable will be inserted in the reject message. |
|
errortitle |
if set and errorhtml is not set, this variable defines the contents of the <title> tag in a reject page. |
|
errorrejectreason |
if set and errorhtml is not set, this variable allows an override of the default reason string in a reject page. |
|
rejectedheaders |
if Yxorp rejects a header (because it does not know this header, it is overlength, contains illegal characters) the header name is added to this variable. |
|
_pattern |
Set to the last executed regular expression pattern within the context of whitelist and blacklist statements. Not set for normal regular expressions. |
|
_[0-9] |
Set after the execution of a regular expression, depending on the pattern. Only the variables needed to capture the data extraction from the pattern are updated; other variables are untouched. Note that this functionality is only available if you have PCRE included in the build. |
This section is a reference of the rule program syntax.
program: statements
A program is the entirety of the source code in a single rule. A program (i.e. rule) is comprised of one or more statements.
Note that a rule can not be empty; at least one statement must be in the rule, or the rule will fail to compile.
block: { statements }A block is used to group one or more statements. As in many other languages, this is most often used together with other statements, like the if statement.
statement: if (expression) statement
or if (expression) statement else statement
or foreach identifier (expression) statement
or whitelist identifier { list-elements } if-failed statement
or blacklist identifier { list-elements } if-failed statement
or expression
or block
or returnThere are several statements, as listed here. Note that a block can contain statements; by extension, a statement that contains another statement can also contain a block, and thus more statements.
if: if (expression) statement or if (expression) statement else statement
The if statement works exactly as you would expect. Note that the expression is expected to result in a truth value. In rules, true is anything that is not an empty string; false is an empty string.
The simple if statement (without else) can also be written as follows:
if-shorthand: '?' expression statement
foreach: foreach identifier (expression) statement
The foreach statement implements loops. The expression in the statement is expected to result in a space separated list. The foreach statement loops once for each element in the list, setting the variable named in the identifier to the list element.
The foreach statement is supported by several function calls, that deliver lists of variable names.
whitelist: whitelist identifier { list-elements } if-failed statement
list-elements: -/regexp/ // simple, case insensitive
or ~/regexp/ // simple, case sensitive
or -/regexp/ : list-statement // complex, case insensitive
or ~/regexp/ : list-statement // complex, case sensitive
list-statement: statement + continue-listThe whitelist statement is used to easily check the value of a variable against a set of regular expressions. If a match is found, the rest of the whitelist statement is skipped. If no match is found, the statement following the if-failed keyword is executed.
All forms of list-elements may be used in the same whitelist. There are no syntax restrictions to code formatting, but normal practice is to write each regular expression on its own line.
If a regular expression is in the complex form, the statement following the regular expression is executed before the normal whitelist action. In this context, a special statement is available with the continue-list keyword. If the continue-list statement is executed, the whitelist is resumed as if no match had occurred; this can be used to handle exception cases.
Normally, the if-failed clause is used to stop execution, reject a request, etc; however, note that this must be done explicitly, since the rule program language does not execute a default action.
After the whitelist completes, the special variable _pattern is set to the last executed pattern in a whitelist. This can be used to determine which pattern in the list matched. If whitelists are nested, the fact that _pattern contains the last executed pattern, not the successful match, may produce unexpected results.
Note that the scope in which _pattern is valid is different in a whitelist and a blacklist.
blacklist: blacklist identifier { list-elements } if-failed statement
list-elements: -/regexp/ // simple, case insensitive
or ~/regexp/ // simple, case sensitive
or -/regexp/ : list-statement // complex, case insensitive
or ~/regexp/ : list-statement // complex, case sensitive
list-statement: statement + continue-listAs with the whitelist statement, the blacklist statement is used to easily check the value of a variable against a set of regular expressions. If a match is found, the statement following the if-failed keyword is executed. If no match occurs, the if-failed clause is not executed.
All forms of list-elements may be used in the same blacklist. There are no syntax restrictions to code formatting, but normal practice is to write each regular expression on its own line.
If a regular expression is in the complex form, the statement following the regular expression is executed before the normal blacklist action. In this context, a special statement is available with the continue-list keyword. If the continue-list statement is executed, the blacklist resumed as if no match had occurred; this can be used to handle exception cases.
Normally, the if-failed clause is used to stop execution, reject a request, etc; however, note that this must be done explicitly, since the rule program language does not execute a default action.
After the blacklist executes, and within the context of the if-failed clause, the _pattern special variable contains the pattern that last executed. If blacklists are nested, the fact that _pattern contains the last executed pattern, not the successful match, may produced unexpected results.
Note that the scope in which _pattern is valid is different in a whitelist and a blacklist.
return: return or return (expression)
The return statement ends the execution of the current rule program. Either version can be used, but the value of the expression in a return statement is currently not used. This may change in future versions.
The forms that an expression may have are listed below:
expression: call(parameters) // function call or literal // value or number // integral number value or identifier = expression // assignment or identifier // variable or $identifier // indirect variable or ! expression // logical not or expression && expression // logical and or expression || expression // logical or or expression + expression // numerical addition or expression – expression // numerical subtraction or expression * expression // numerical multiplication or expression / expression // numerical division or expression == expression // numerical compare equal or expression != expression // numerical compare not equal or expression < expression // numerical compare less or expression > expression // numerical compare greater or expression <= expression // numerical compare less or equal or expression >= expression // numerical compare greater or equal or (expression) // order expression priority or identifier -/regexp/ // case insensitive regexp or identifier ~/regexp/ // case sensitive regexp
Most expression forms are very common, and will not be detailed further.
Note that the numerical operations are only valid if the operands are actually numbers.
identifier: letter { letter | digit | “_” | “:” }Identifiers are used for variable names. They must start with a letter, and may contain digits, underscores, and colons. Colons should only be used at the end, and signify that the identifier points to a variable that is used to store the contents of a header.
Internally, identifiers are also used for function names, and will be used in future versions to refer to other rules.
number: digit { digit }Numbers are formed as one or more digits. Decimal points (i.e. floating point numbers) are not supported. Negative numbers are also not supported.
In the internal representation, all numbers are stored as strings. There is no difference between 0 and “0” in the rule syntax.
literal: '”' { character } '”'Literals are values that are expressed directly in the source, as in “index.html”. Think of them as strings.
indirect variable : $identifier
An indirect variable contains the name of another variable; this variable's value is the value of the expression form listed here. If the indirect variable is used without the indirection operator '$', the variable itself is addressed; if the indirection operator is used, the value of the variable points to another variable.
regular expression: identifier -/regexp/ // case insensitive regexp or identifier ~/regexp/ // case sensitive regexp
Regular expressions are dependent on the regular expression library you included (libc or PCRE). With the posix-libc variant, only basic regexps are available; see your man page for regcomp, regexec, or regex for a description of what you can do if you have this library. Yxorp uses this variant only if PCRE could not be found on your system during configuration. PCRE is highly recommended, as it has much more functionality, and more consistent over different types of system. If you are not sure which library you have, check with yxorp -V how your build is configured.
If you included PCRE, you can use most constructs that are possible in Perl. Extraction of matched data from the original string is supported using the _0, _1, up to _9 special variables. Note that as in Perl, only the variables that were actually necessary for storing matched data are updated.
Both case-sensitive and case-insensitive variants of the regexp calls are available. Respectively use a tilde '~' or a minus '-' to specify which you want to use. The regexp itself must be enclosed in slashes. There is no implicit string begin or end added to the regexp; if you want to match against the start or end of a string, use $ and ^, respectively.
check if valid basic authentication credentials are present in the request (i.e. the Authorization: Basic header). If this is not found, the request is rejected with a 401 status code (normally causing a browser to show the userid/password dialog). The realm (text) is shown in this dialog window, and is also checked against the basic authentication credentials table.
If the authentication was successful, true is returned; false otherwise (and the request rejected). Note that execution of the rule program does not stop if the basic_auth_check fails; this must be done explicitly; normally, a return statement should be used to prevent execution of statements following the call to basic_auth_check. Typical use is as:
// check if basic authentication credentials are set
if (!basic_auth_check(“my-realm”, “local)) {
return; // end the rule, so that the implicit reject
// from basic_auth_check is processed
}
// reach here if basic_auth_check was successfulThe second parameter to the basic_auth_check function must be exactly “local”. This refers to the use of an internal table for the authentication credentials. Future versions may support other tables; then, this parameter will be used to indicate which table should be used.
clientinrange checks if the ip address that the client uses on this connection is part of the IPv4 range that is passed. If the client address or the range can not be parsed, false will be returned. Note that the client address can not be parsed if it is in IPv6 format, as would happen if the request comes in from an IPv6 listener. The range must be in the format a.b.c.d/x, where 1<=x<=32.
Typical use is as follows:
// check where the request comes from
if (clientinrange(127.0.0.1/32) {
// allow some things
} else if (clientinrange(192.0.2.0/24) {
// allow some other things
} else {
// don't allow things
reject(“sorry...”);
}Similar to clientinrange, but for IPv6 addresses. The same limitations apply; IPv4 clients coming in through an IPv4 mode listener can not be correctly processed by this function.
The range can be specified in the following formats:
x:x:x:x:x:x:x:x/y // default x::x/y // missing parts are set to zeros ::a.b.c.d/y // deprecated transitional form ::ffff:a.b.c.d/y // ipv4 mapped ipv6 address ::1/y // localhost x = 4-digit hexadecimal y = decimal, 1<=y<=128 a.b.c.d = ipv4 address range
clientstate retrieves values from the client state, depending on the string passed to it:
“toclient”: the number of bytes sent to the client on this clientstate (cumulative)
“toserver”: the number of bytes sent to the server on this clientstate (cumulative)
“fromclient”: the number of bytes received from the client on this clientstate (cumulative)
“fromserver”: the number of bytes received from the server on this clientstate (cumulative)
“hitcount”: the number of requests processed on this clientstate
“id”: the cookie value for this clientstate
concat takes a variable number of arguments and concatenates them. The concatenated string is returned.
contains is equal to the C function strstr. It takes two arguments; the first is the haystack; the second is the needle. It returns a true if the needle is found in the haystack; false otherwise. The comparison is case sensitive.
Note you can also use a regular expression; this is a lot more flexible.
contains_characters takes two arguments; the first is a string, which is tested against a list (which is also a string, by the way). If any of the characters in the list occur in the string, true is returned; false otherwise. The comparison between characters is case sensitive.
containscase is equal to the C function strcasestr (if that exists on your platform). It takes two arguments; the first is the haystack; the second is the needle. It returns true if the needle is found in the haystack; false otherwise. The comparison is case insensitive.
Note you can also use a regular expression; this is a lot more flexible.
enumerate_dupvar takes, as its only argument, a string containing the name of a variable. It returns a space separated string containing the real variable names of all variables in a duplicate variable set. If the variable is singular (i.e. there are no other members in the duplicate variable set) only the variable's name is returned.
enumerate_reqhdr takes no arguments. It returns a space separated string containing the variable names of all variables that have the REQHDR attribute set. Yxorp sets this attribute for all variables it creates to represent request headers.
enumerate_rsphdr takes no arguments. It returns a space separated string containing the variable names of all variables that have the RSPHDR attribute set. Yxorp sets this attribute for all variables it creates to represent response headers.
equal performs a case sensitive comparison of two variables.
equal performs a case insensitive comparison of two variables.
findheader takes one parameter: the exact, case-sensitive name of a header. It returns the value of this header in the current request or response, or false if it is not found. findheader may only be used in request or response rules; the results are undefined for other rule types. The header is not processed in any way, and the settings in the <header> tags in the <globalconfiguration> section do not apply.
getattr tests a variable for a specific attribute. The variable name must be set as the first parameter; the exact attribute as the second. If the attribute is present, true will be returned; if not, false will be returned.
getclientip returns the ip address (IPv4 or IPv6) of the client associated with the current request.
getclientip returns the domain name of the client associated with the current request. Note that using this function causes a domain name lookup to be done; this may impact performance, especially for large volumes.
getclientstatedvar reads a variable from the client state, if one exists for this request. The variable name must be set as the first parameter. If the variable and the client state exist, the data will be returned; otherwise, an empty string (i.e. false) will be returned.
getlength returns the length of a variable. The name of the variable is passed to getlength.
getlistenerid returns the id (i.e. name) of the listener that has received the request that is currently being processed.
getmaxcount returns the number of variables in a duplicate variable group
getmaxlength returns the maxlength attribute of a variable.
getorder returns the order attribute of a variable.
getorder returns the originalname attribute of a variable.
getserverturnaround returns the time, in milliseconds, that it took from sending the request to the server, to the response header being completely received. This gives an indication of server response; note that this time is however in some cases quite different from the response time an end user may experience.
getsslbackend returns the value of the sslbackend flag. If set, this means Yxorp will use SSL to the server; if not set, Yxorp will use plain HTTP.
getsticky returns the value of the sticky flag.
issslsession returns true if the client session uses SSL; false otherwise.
redirects the request, by rejecting it with a statuscode 307 (Temporary Redirect), passing the URL in a Location: header. This causes a browser to redirect to this URL.
the request is rejected, with the specified reason. The reason string is also logged to the error log.
Sets the attribute on the specified variable. The following attribute values can be set:
RSPHDR: this variable is part of the response header group.
REQHDR: this variable is part of the request header group.
REJHDR: this variable is part of the reject header group.
setclientstatedvar sets a variable from the client state, if one exists for this request. The variable name must be set as the first parameter; its value as the second. Note that client states are created after a request type rule runs, so on the first request from a client, the client state is not yet available. Also note that the number of slots in the client state dvar table is normally very limited (but it can be increased in the global configuration).
This function sets the domain part of the state cookie that Yxorp uses for tracking or sticky load balancing. If no domain is set, the generally accepted behavior of clients is to only send back the cookie values to the same domain that set the cookie. If you use the domain part, you can cause the cookie values to be sent to several hostnames in a domain group (i.e. set the domain to .example.org to have the cookie sent to www.example.org, www2.example.org, etc).
Sets the next instance of the duplicate variable to the specified value.
If, for example, a dupvar exists:
ex=an ex1=example
the call setdup(“ex”, “test”) would have the following result:
ex=an ex1=example ex2=test
Sets the maximum length attribute on the specified variable (after this, setting a longer value in a variable results in the value being truncated).
Sets the order attribute on the specified variable.
Sets the originalname attribute on the specified variable.
Forces the server session for this request to use SSL.
Sets the sticky flag for this request, causing sticky scheduling and client tracking for this request.
The text message for the HTTP response code is returned.
The message is written to the trace log. If no trace log is active, the message is discarded.
Stops forcing the server session for this request to use SSL.
Clears the sticky flag for this request, disabling sticky scheduling and client tracking for this request.
yesno takes the truth value from the argument and returns yes (=true) or no(=false).
The code generated from the rule program sources is compiled, and executed by a virtual machine. The virtual machine implements a stack machine (like a RPN desk calculator) working on strings.
In general, the virtual machine is surprisingly fast. The instructions that the virtual machine processes are mostly very simple and straightforward. Even though the rule compiler is simple, the execution speed of the rule programs is not really impacted by the lack of optimization of the compiler. Most data manipulations are accomplished just by moving pointers around. Temporary variables are allocated in scratch memory pools; this speeds up execution because the memory allocation is much faster in this way, and also this ensures that all the used pointers will be valid for the duration of the VM run, even if memory shortages occur.
There is, of course, also a weak point: the virtual machine is not especially good at arithmetic. Since all data is represented as strings, calculations require several conversions. Number handling is also quite simplistic; if a string value is used in a numeric operation, the value will default to 0 instead of causing an error (this behavior is likely to change in a future version).
The use of memory pools causes that there is a limit to the maximum size of an object. Currently, this is in the order of 4K bytes, which should not be a limitation for typical use. If your application needs larger objects, source-level tuning is necessary.
As all other Yxorp modules, the virtual machine contains a large amount of debugging hooks. If you build Yxorp with debugging enabled, performance will be significantly lowered.