Getting the Original URL in Apache
There are various situations where one might want to know the full URL sent over HTTP by the user agent, before any rewriting has occurring.
Depending on the situation and setup, it can be as simple as using CGI variables
such as path_info
, redirect_url
or request_uri
, and within a JVM servlet
getRequestUrl()
may prove useful - but none of those are guaranteed to be
the URL which Apache received, nor are any of Apache's other documented variables.
Fortunately there is a workaround, because one variable provided is the
first line of the HTTP request, which contains the desired request URL nestled
between the method and protocol, i.e: "GET /url HTTP/1.1
" - meaning all that
needs doing is to chop the ends off.
It is relatively simple to extract the URL, and at the same time provide it to later scripts, by using the RequestHeader directive from mod_headers to set and modify a header, like so:
RequestHeader set X-Original-URL "expr=%{THE_REQUEST}"
RequestHeader edit* X-Original-URL ^[A-Z]+\s|\sHTTP/1\.\d$ ""
The first line creates a header named X-Original-URL
with the full value of
the variable.
The second line performs a regex replace on the specified header, matching
both the request method and its trailing space (^[A-Z]+\s
\sHTTP/1\.\d$
The *
after edit is what makes the replace occur multiple times - without it
only the first match would be replaced. (i.e. the *
is equivalent to a g
/Global flag.)
The name X-Original-URL
is used for compatibility with the equivalent
header set by the IIS URL Rewrite Module - both that module and the above
solution provide the full request URL, including query string, and encoded in
whatever manner the user agent sent, but one difference is that the above
config always sets the header, whilst the IIS version only sets it when the URL
has been rewritten.