Useful Utilities

Support > ezproxy.cfg >

Domain, Host, URL

The Domain, DomainJavascript (DJ), Host, HostJavascript (HJ), and URL directives in config.txt have both unique functions and overlapping functions.

There are two separate instances in which EZproxy uses these entries to decide whether or not to proxy a particular hostname: while processing a starting point URL and when encountering hostnames that appear within web pages that are being proxied. An explanation of which directives affect which of these instances follows.

Terminology

The following terms will be used when referencing portions of a URL. These definitions are simplified versions that are adequate to understand their use within this document but are over generalized from their exact meanings.

scheme
Although many other schemes exist, for purposes of this document, the scheme is the protocol used for retrieval and is either http or https.
hostname
The name or address of the webserver to be accessed. Hostname is not case-sensitive (e.g., www.somedb.com and WWW.SomeDb.com are equivalent).
port
A number used to identify a specific webserver at the provided hostname. When omitted, a scheme specific default value is used. For http, the default is 80. For https, the default is 443.
origin
The unique combination of a scheme, hostname and port combined as scheme://hostname:port.
path
The portion of the URL from a slash (/) following the origin up to the query or fragment. When omitted, the default path / is used.
query
The portion of the URL from the first question mark (?) following the path up to the fragment. If the first question mark (?) in a URL appears after a hash (#), that section is not the query, but rather part of the fragment.
fragment
The portion of the URL from a hash (#) through the end.

Sample URLs and their components

URL 1http://www.somedb.com/
schemehttp
hostnamewww.somedb.com
port80
originhttp://www.somedb.com:80
path/
query 
fragment 
URL 2http://www.somedb.com:80
schemehttp
hostnamewww.somedb.com
port80
originhttp://www.somedb.com:80
path/
query 
fragment 
URL 3http://www.somedb.com/search?q=ancient
schemehttp
hostnamewww.somedb.com
port80
originhttp://www.somedb.com:80
path/search
query?q=ancient
fragment 
URL 4https://www.somedb.com/search?q=ancient
schemehttps
hostnamewww.somedb.com
port443
originhttps://www.somedb.com:443
path/search
query?q=ancient
fragment 
URL 5http://www.somedb.com:8080/history?era=darkages
schemehttp
hostnamewww.somedb.com
port8080
originhttp://www.somedb.com:8080
path/history
query?era=darkages
fragment 
URL 6http://search.somedb.com:8080/history?era=darkages
schemehttp
hostnamesearch.somedb.com
port8080
originhttp://search.somedb.com:8080
path/history
query?era=darkages
fragment 
URL 7http://search.somedb.com:8080/history#?modern
schemehttp
hostnamesearch.somedb.com
port8080
originhttp://search.somedb.com:8080
path/history
query 
fragment#?modern

URLs 1 and 2 are functionally equivalent even though URL 1 uses the default port and URL 2 uses the default path.

URLs 1, 2 and 3 all use the same origin, even though 1 and 3 depend on the default port, wereas 2 has an explicit port and 3 has a path.

URLs 3 and 4 are not functionally equivalent as they use different schemes.

URLs 5 and 6 are not functionally equivalent as they use different hostnames.

URL 7 does not have a query since the first question mark (?) appears after the first hash (#).

Origins in starting point URLs (simple http with standard port case)

In general, EZproxy ignores the path, query and fragment when making proxying decisions. These details are only used when generating the URLs shown in the default menu page and the server status page.

Users are routed to specific databases using starting point URLs. Starting point URLs take the form:

http://ezproxy.yourlib.org:2048/login?url=http://www.somedb.com/index.html

where http://www.somedb.com/index.html is an example of a URL to which the user should be proxied.

When processing a starting point URL, EZproxy decides whether or not to allow access by taking the origin of the request URL (e.g., http://www.somedb.com:80) and trying to find a URL, Host, or HostJavascript (HJ) directive with the identical origin. The Domain and DomainJavascript (DJ) are not directly involved in this processing. Any of the following directives in config.txt would be considered a match to authorize the starting point URL since they share the same origin http://www.somedb.com:80 as the starting point URL.

URL http://www.somedb.com/
URL http://www.somedb.com/index.html
URL http://www.somedb.com/history/
Host www.somedb.com
Host http://www.somedb.com
HJ www.somedb.com
HJ http://www.somedb.com

All three of these URL directives would authorize http://www.somedb.com/index.html for access, since all of them have the origin http://www.somedb.com:80, even though they have different paths.

Host and HostJavascript directives default to the http scheme, making the two Host directives equivalent and the two HostJavascript directives equivalent.

There is an exception in which Domain and DomainJavascript (DJ) directives are indirectly involved in authorizing hostnames for use in starting point URLs, but this occurs by a fluke that is discussed at the end of this page. The recommended method for configuring EZproxy is to keep in mind that the origin of any starting point URL must match the origin of a URL, Host, or HostJavascript line.

Origins in starting point URLs (advanced case)

The previous examples all assume that the destination URLs are of the form http://www.somedb.com, using http:// at the start and no port at the end. If a destination URL uses https:// or includes a non-defaultport number, then it will only match to a URL, Host, or HostJavascript (HJ) line with the same information. For instance:

http://ezproxy.yourlib.org:2048/login?url=http://www.somedb.com:8080/index.html

has the origin http://www.somedb.com:8080 which does NOT match the origin of any of the previous examples, but would match:

URL http://www.somedb.com:8080/
URL http://www.somedb.com:8080/index.html
URL http://www.somedb.com:8080/history/
Host www.somedb.com:8080
Host http://www.somedb.com:8080
HJ www.somedb.com:8080
HJ http://www.somedb.com:8080
as these later examples all have the origin http://www.somedb.com:8080.

Likewise, the starting point URL:

http://ezproxy.yourlib.org:2048/login?url=https://www.somedb.com/index.html

has the origin https://www.somedb.com:443 which does NOT match any of the previous examples, but would match:

URL https://www.somedb.com/
URL https://www.somedb.com/index.html
URL https://www.somedb.com/history/
Host https://www.somedb.com
HJ https://www.somedb.com
which all have the origin https://www.somedb.com:443.

Note in these examples that there is no simple "Host www.somedb.com" style of entry as that basic form of entry defaults to http, not https.

Origins encountered during proxying

Starting point URLs inject the user into the proxying process. Once a user starts requesting web pages through EZproxy, EZproxy will start retrieving and processing web pages from remote servers. As a web page is retrieved, EZproxy must decide whether or not to rewrite web page links that it encounters.

As EZproxy encounters each URL, it will choose to proxy a URL if that URL matches the starting point URL logic mentioned above and EZproxy will also look consider Domain and DomainJavascript (DJ) directives. When attempting to match to a Domain or DomainJavascript (DJ) line, EZproxy ignores the scheme (http:// or https://) and port. EZproxy considers a hostname to match a Domain or DomainJavascript (DJ) directive if the hostname matches the domain name or ends in the domain name. For instance, if EZproxy encounters:

http://www.history.somedb.com

this would not origin match any of the previous URL, Host, or HostJavascript directives, since http://www.history.somedb.com:80 does not exactly match the origin of any of those directives, but it would match:

Domain somedb.com
Domain history.somedb.com
Domain www.history.somedb.com
DJ somedb.com
DJ history.somedb.com
DJ www.history.somedb.com

since in each of these examples, the hostname www.history.somedb.com either matches the specified domain exactly or ends with a period followed by one of the specified domains.

Domain and DomainJavascript (DJ) directives allow EZproxy to automatically proxy all of the additional hosts that are used by a database vendor without requiring you to predict all the hostnames that might be encountered.

IP addresses as hostnames

Some vendors use IP addresses as hostnames. In such an instance, the rules for URL, Host, and HostJavascript are exactly the same, using an exact match. To match a series of IP addresses with a Domain or DomainJavascript (DJ) directive, you must introduce an asterisk wildcard, such as:

Domain 132.174.*
DJ 132.174.*
which would match any IP address that starts with 132.174.

HostJavascript (HJ) and DomainJavascript (DJ)

The HostJavascript (HJ) and DomainJavascript (DJ) directives indicate that when EZproxy is proxying a web page from a matching server, additional JavaScript processing should be performed. For example:

Title Some Database
URL http://www.somedb.com
DJ somedb.com

indicates that all hosts that end with or are somedb.com should have additional JavaScript processing performed. In a mixture of JavaScript and non-JavaScript directives, the JavaScript directives takes priority. For example, in:

Title Some Database
URL http://www.somedb.com
Host search.somedb.com
DJ somedb.com

when search.somedb.com is proxied, JavaScript processing will be enabled since its name matches the "DJ somedb.com" directive.

When developing database stanza, the recommendation is to start with the normal form of the directives, but if you see the user slipping away from proxying, try using the JavaScript counter-parts to see if they resolve the issue.

Conflicting stanzas

In some instances, a particular hostname may match multiple database stanzas in config.txt. For instance, consider http://www.somedb.com against these entries:

Title Some Database First
URL http://www.somedb.com
Domain somedb.com

Title Some Database Second
URL http://search.somedb.com
DJ somedb.com

Since http://search.somedb.com has an origin match to the second URL line, it can be used in a starting point URL.

Since http://www.somedb.com matches the second URL directives, the Domain directive, and the DJ directive, it would be rewritten if encountered in a web page.

Since EZproxy bases its proxying behavior on the very first database stanza that matches via URL, Host, HostJavascript, Domain, or DomainJavascript directives, ignoring subsequent stanzas, the first stanza controls proxying behavior. As such, the proxying of http://search.somedb.com will NOT have additional Javascript processing, with the subsequent "DJ somedb.com" effectively ignored due to the earlier stanza.

A dangerous exception to the rule

Consider the database stanza:

Title Some Database
URL http://www.somedb.com
Domain somedb.com

By everything discussed thus far, this stanza would allow this URL to work:

http://ezproxy.yourlib.org:2048/login?url=http://www.somedb.com/

but would cause this URL to fail:

http://ezproxy.yourlib.org:2048/login?url=http://search.somedb.com/

since there is no URL, Host, or HostJavascript (HJ) line that matches the origin http://search.somedb.com:80. Yet, in practice, you will encounter scenarios where this will appear to work correctly.

The instance in which this occurs happens when one of your users starts out by entering at http://www.somedb.com. As EZproxy retrieves that page, it encounters a link to http://search.somedb.com. At that point, search.somedb.com will match the Domain line, so EZproxy will proxy the link. At that moment, EZproxy creates a virtual web server for http://search.somedb.com. Once this happens, EZproxy will accept a starting point URL to http://search.somedb.com.

In this instance, http://search.somedb.com is working as a side-effect. It is a bad idea to depend on this type of behavior, as EZproxy can discard that information over time, causing the links that work one day to stop working the next if the ezproxy.hst file is reset.

As a result, any URL that appears in a starting point URL should always origin match with a URL, Host, or HostJavascript (HJ) directive.

Further information

If you have any questions, comments or suggestions, from the smallest typo to the biggest problem, please send them to info@UsefulUtilities.com.


www.usefulutilities.com/support/cfg/dhu.html
© 2024 Useful Utilities, LLC