Cgilite

CGI programs, are executables run by a web server to produce a dynamic website. In most cases CGI programs are expected to produce an HTML document on stdout. They can retreive input and environment variables from the webserver, and thus react to the input of a web site user.

CGI programs are commonly written in languages like PHP, Perl, Pyhon, Ruby, NodeJS, etc. It is of course possible to write CGI programs in any programming language.

Cgilite aims to enable production and testing of CGI programs in Unix shell scipt. Because a shell interpreter is available on virtually all Unix-like systems, this is practical to write web applications with very minimalistic requirements. Cgilite based applications can run on busybox based systems, which are common on embedded hardware. Even on a "proper" webserver they have the advantage of having low dependencies in regard to the system installation, interpreter programs, and libraries.

Cgilite aims to be portable, so applications can run with different implementations of shell and userland tools (e.g. GNU, BSD, Busybox, etc.)

Usage

Cgilite should be included as part of your project. For GIT-Projects I suggest using the git subtree feature. I.e. to include the software in a cgilite/ subfolder within your own project use:

git subtree add --squash -P cgilite https://git.plutz.net/git/cgilite master

If you are not using git for your own project, just clone:

git clone https://git.plutz.net/git/cgilite

You can also clone to /opt and include files from there.

For the rest of this documentation we will assume, cgilite resides in a subfolder of your own project.

Writing a program

#!/bin/sh

##  Include CGIlite
. cgilite/cgilite.sh 

##  Read POST data (can be anywhere in the program)
name="$(POST name)"

##  CGI programs should produce necessary header fields
##  (must happen at the start of the output)
printf 'Content-Type: text/html\r\n'
printf '\r\n'  ##  An additional newline starts the output content

##  Heredocs are a convenient way to produce longer output containing shell expansions
cat <<-END
	<!DOCTYPE HTML>
	<HTML><HEAD>
	  <TITLE>Demo</TITLE>
	</HEAD><BODY>
	END

if [ "$name" ]; then
  printf '<P>Hello my dear %s</P>' "$(HTML "${name}")"
else
  cat <<-END
	  <FORM METHOD="POST" ACTION="${PATH_INFO}">
	    <LABEL>What's your name? <INPUT TYPE="TEXT" NAME="name"></LABEL>
	    <BUTTON TYPE="submit">
	  </FORM>
	END
fi

printf '</BODY></HTML>'

Serving the program

Cgilite can be served via the CGI function of a web server. Cgilite does also include a HTTP-Responder, and can be served inetd-Style, so that a full web server is not needed.

Via Busybox netcat

~$ busybox nc -llp 1080 -c /opt/app/index.cgi

Via inetd

# <service_name>	<sock_type>	<proto>	<flags>	<user>	<server_path>		<args>
1080			stream		tcp	nowait	user	/opt/app/index.cgi	/opt/app/index.cgi

In Apache httpd

<VirtualHost *:80>
  ServerName example.com

  <Location />
    Require	All	granted
  </Location>

  ScriptAlias	/	/opt/app/index.cgi/
</VirtualHost>

Functions and Variables

Cgilite is split into different Modules, e.g. for session handling, file serving, etc. Modules can be included into your program as needed.

Core Module

All core function reside in cgilite.sh

Variable $_EXEC

Should be set to the directory, from where your application is served, i.e. where you index.cgi is located. If this variable is not set, it cgilite will try to determine it for the shells Argument $0, though this method may be unreliable.

It is recommended to set this as an environment variable before executing your program.

Your program should respect this variable whenever referring to files from the application directory. All cgilite functions will do that.

Variable $_DATA

Should be set to the directory where your application stores its data. Data should not be stored in the execution directory, ever. Separating data and execution directory enables multi-site setups, where a single installation of your application can serve multiple sites / domains / vhosts, each with their own distinct data sets. This variable defaults to ., i.e. the current working directory when the program is executed.

It is recommended to set this as an environment variable before executing your program.

Your program should respect this variable whenever referring to files from the application directory. All cgilite functions will do that.

Variable $_BASE

This is an optional variable which you can use when serving an application at a subdirectory of a domain. E.g. if you are runnning some website at example.com and your application will be served in example.com/myapp/, then this variable should be set to /myapp. It is empty by default.

All cgilite functions will respect this variable when generating hyperlinks and HTTP-redirects.

$CR and $BR

Those are convenience variables containing a single carriage return (\r, 0x0d) and a single line break (\n, 0x0a) character respectively. You can use them in shell patterns for the case statement and substring expansions. Do not try to override them!

Webserver Variables

The Variables in this list should be set by any CGI webserver. If a cgilite application is executed in inetd mode, cgilite will set those variables. If the application is executed by a webserver, you will have to rely on its understanding of the CGI specification, especially when running on a commercial web hoster.

A special importance falls onto the variable $REQUEST_METHOD. If this variable is unset or empty when the application is launched, cgilite will assume it is being run in inetd mode. It will then begin reading HTTP-Headers and acting as a web server. If on the other hand $REQUEST_METHOD is set to any value, cgilite will assume it is being run by an external webserver. In this case it will not handle request information, and it will rely on the webserver to provide all the variables described in this section.

  • $REMOTE_ADDR

    The address of the remote client. When running via inetd this is determined from $TCPREMOTEIP. Otherwise this may contain a DNS name, an IPv4 address, an IPv6 address, nothing, or something surprising.

  • $SERVER_NAME

    The address of the webserver running your application. When running via inetd this is determined from $TCPLOCALIP. Otherwise this may contain a DNS name, an IPv4 address, an IPv6 address, nothing, or something surprising.

  • $SERVER_PORT

    The TCP port on which our application is served. When running via inetd this is determined from $TCPLOCALPORT. Webservers will usually set this to either 80 or 443.

    In general there is no reliance on the connection variables, as web hosters may run your application through a reverse proxy, and in some cases (e.g. when running through fcgiwrapper or similar) it will not even make sense to put anything in those variables.

  • $REQUEST_METHOD

    Most often GET or POST. Sometimes HEAD, PUT, etc.

  • $REQUEST_URI

    The entire local part of the web address being called, e.g. /myapp/somepage?some=parameter

  • $SERVER_PROTOCOL

    Usually HTTP/1.1. It is safest, not to care about this.

  • $PATH_INFO

    The local path of the web address being called, without the QUERY_STRING. E.g. /myapp/somepage. Despite being one of the most important variables for your application, web servers may mess this up thoroughly when running via ScriptAlias or similar. Usually you can figure out some work around. Good luck with commercial hosters!

  • $QUERY_STRING

    The Part behind the ? of the address being called. Contains all the GET-Variables. Use the GET functions when you want to read data from it. You may keep this variable around for use in redirects, etc.

  • $CONTENT_LENGTH

    Used in POST and PUT operations to determine length of the content uploaded bey the client.

  • $CONTENT_TYPE

    Also used whenever the client uploads data. Most notable are application/x-www-form-urlencoded, the ususal case, where you can use the POST functions to read data, and multipart/form-data which is used in file uploads via web form.

Function PATH

Normalizes a file path. Use this to sanitize user input.

There are two usage modes.

  • Call via parameter:

    ~$ sanepath="$(PATH "$inputpath")"

  • Call via stdin:

    ~$ sanepath="$(printf %s "$inputpath" | PATH)

    this allows you to chain different text functions together (e.g. read directly from the HEX_DECODE function, etc.)

The resulting path will always start with a /, even if the input is an empty string. The resulting path will never contain ., or .. segments. This function is agnostic to directories in the file system and resulting paths may not make sense in regard to file system objects. All .. segments within the input path will be collapsed, to the parent directory from the input path, but will never ascend beyond segments from the input path. The function is suited for constraining user input to secure values, it not especially suited for making sense when handling actual file system pathes.

Function HEX_DECODE

Decodes strings containing hexadecimal sequences, as in URL encoded strings, etc.

  • The first argument is the prefix marker for a hex tupel. E.g. '%' for a URL encoded string. It cannot be omitted but can be empty, in which case everything will be converted, that looks like a hex tupel.
  • The second argument is the input string.
  • The output will be the plain string, possibly containing binary data and line breaks.

Example:

~$ HEX_DECODE % "some%20string"some string

~$ HEX_DECODE '\x' '42 %42 \42 \x42'42 %42 \42 B

~$ HEX_DECODE '' 'Sc3bcc39f73to66f' - Süßstoff

Function GET, GET_COUNT, GET_KEYS

Function POST, POST_COUNT, POST_KEYS

Function REF, REF_COUNT, REF_KEYS

Internal functions cgilite_value, cgilite_count, cgilite_keys

... are used by the above GET, POST, and REF functions and should be considered private and internal to cgilite. Do not rely on using them, do not override them!

Function HEADER

Function COOKIE

Function SET_COOKIE

Function HTML

Function URL

Function REDIRECT