Rusty Japikse
My Life in Bits
Directory Level Security with a Google Mini Server
Background
A simple secure search solution can be created with a Google Mini server. This is useful for intranet applications where a single Google mini is to be used to provide searches of different document repositories. Out of the box, the Google Mini does not provide any authentication or security mechanism. As the Google Mini is a sealed unit with a limited web interface, not much can be done to add security within the Google Mini.
However, the Google Mini does allow for a security approach to constructed. The Google Mini allows for documents to be grouped into sub-collections and the sub-collections can be specified by the query url (the search form uses the GET method). If sub-collections are created for each set of documents that need to be secured, searches can be done through a secure proxy. At the minimum, this proxy should modify the request URL and rewrite the returned HTML links. To prevent circumventing the proxy, a firewall needs to be installed be Google Mini and the secure proxy. This firewall will only allow access to the Google Mini from the secure proxy.
Security Implementation
At work I set up a Google Mini to search two different document repositories, one with restricted access and the other with company wide access. Here are the steps that I used to provide secure searches with the Google Mini.
From right to left:
-
Google Mini - starting with the Google Mini, create document sub-collections for each directory you want to secure. In my case that was two collections. One sub-collection being a set of documents spanning two different servers which were to have company wide acess and the other sub-collection being a directory full of documents on a secretarial server. This latter group of documents was only to be available to the front office group and to company executives.
Note: the Google Mini can handle an unlimited number of sub-collections, if you are patient enough you can create as many different searches as you like.
-
Firewall - some sort of firewall is needed in front of the Google Mini. One of the easiest solutions is to purchase a Linksys WRT54GL (that is the Linux friendly router), install OpenWRT, and configure your own subnet for the Google Mini to reside upon. The OpenWRT distribution also contains a nice web interface to configure everything. In brief, I configured our firewall as follows:
-
The Google Mini receives a static IP address based upon its MAC address.
-
Ports 80 and 8000 on the Google Mini are forwarded to the Linksys router.
-
Except for TCP/IP connections from the secure proxy, all incoming connections for port 80 on the Google Mini are dropped.
-
Connections to port 80 on the Linksys router and 8000 on the Google Mini are only allowed from network administrator computers, all others are dropped.
-
Secure Proxy - on another server a secure proxy needs to be setup. The Google Mini proxy script has been designed for directory level security. As a result, the script does not provide any security of its own. Instead, the proxy filters the requested URL and it then maps the request to a specific sub-collection.
When returning a results page from the Google Mini, the URLs are rewritten so that links point back to the proxy server. All of the additional features work too, this includes the advanced search and the help.

-
Since much of the webpage is being rewritten on the fly, it is quite trivial to rewrite additional parts of the webpage. In our case the logo and footer were customized. Also specific to the implementation at work was the decision to use IIS to server the secure proxy script. Here is a very brief run down on how I did this:
- Set up python for CGI use (Microsoft Knowledge Base Article 276494).
- Create a directory in the web root for each search setup. I used three, one for company wide access, and two for the front office + executives to access either just the secretarial files or a global file search). Although optional, I also set up viritual directories for each of these directories.
- Set the directory security for the users and / or groups that should have read and execute access to this directory. Integrated authentication was used in our instance.
- Set the default file name to be served to the name of your python script.
- Configure the variables in the header of the python script to map the script to a particular sub-collection.
- Test
- Customize the HTML to your application.
- Test!
- Users - point your users towards the appropriate URLs that they have permission to access. If you care about providing users with a good interface (and like nearly all corporations, your users are on Windows), then you can use group policies to configure the Google Toolbar and the Google Desktop Search to talk directly with your secure proxy.
Note: if a malicious user tries to inject a different sub-collection into the request URL, the bad GET parameter should be stripped based upon the regular expression filters in place.
- Download gproxy.py.
- You may also find it useful to read about CrawlFS.py, a python script for generating an HTML file that can be crawled by the Google Mini. (http://japikse.com/resources/scripts/crawlfs.html)