Using Spider (Windows version)
Spider's purpose is to identify files that may contain confidential data. It scans a collection of files, searching for patterns of numbers or letters that resemble Social Security numbers or credit card numbers (additional search patterns can be created using Unix regular expressions). Spider creates a log that lists all the files identified as potentially containing confidential data. The person using Spider should then look through this log, examine each of the files listed, and take steps to protect any files that prove to contain confidential data. Protection steps may include encrypting files, or moving files to a secure server or to offline storage. It is against University policy to keep sensitive data on an unsecured workstation.
Spider will misidentify certain types of files as containing confidential data. Every effort should be made to verify Spider's results before moving, encrypting, or removing files. RPM, TIFF, and TrueType font are among the filetypes notorious for false positives.
Spider's logs can function as a roadmap to confidential data and must be well secured.
Install Spider
Requirements: Windows 2000, XP, or 2003 with Microsoft .NET 1.1 installed
- To ensure you have the latest .NET 1.1:
- Go to http://windowsupdate.microsoft.com
- Click on the Custom button
- After your system has been analyzed, click on the Software, Optional(n) area on the left of the browser and check Microsoft .NET Framework version 1.1
- Click Review and install updates
- Click Install Updates
- If you did not have .NET 1.1 installed you will have to go back to http://windowsupdate.microsoft.com to get .NET 1.1 Service Pack 1
- Click on the Express button
- After your system has been analyzed, click on the Install Updates button
- When the download and installation is complete you will have to restart your computer.
- To install Spider:
- Download Spider
- Run the downloaded file, Setup.exe
- A setup program will run. Click Next > in each window, except the License window where you need to click Yes. Click Close in the final window.
Run Spider
- From the Start menu, choose All Programs, then Cornell Spider, then spider
- Click the Run Spider button
- Spider may take a long time to run, depending on the number of files to be searched. When it has finished, the Spider window will change back to its initial state as shown below, and the log file will be ready for review. The log file is C:\SPIDER.LOG unless you use the configuration options shown in the next section to specify a different location.
- any file marked by Windows as Archive, Normal, Compressed, or Temporary
- any of the above file types that is not open and locked by another application
- any unencrypted file stored within a readable ZIP archive
- files open and locked by another process (i.e., an Excel spreadsheet currently opened by Excel)
- system files (executables, DLLs, the pagefile, the hibernation file, etc.)
- encrypted files (on the assumption they are inaccessible)
- sparse files (generally weird databases that Spider couldn't reliably parse anyway)
Spider in operation:
Spider starts by building a list of files that meet the scan selection criteria, then begins its work.
Files Spider WILL process:Spider also has a command-line interface.
Configure Spider
Spider can be run without adjusting any settings, but the Spider window offers a number of options to customize your session and log.
To adjust Spider settings, choose Configure, then Settings.
![]()
Runtime tab
- Minimize window when spidering
- causes the Spider window to minimize while Spider is working. Be aware that if you do other work on the same computer while Spider is running, Spider may not be able to scan files that are in use by other applications.
- Process priority
- Normal: Spider allows Windows to assign a priority to its file processing task.
Low: Spider will ask Windows for a lower process priority while scanning files. This preserves resources and is less disruptive while the system is in use.
High: Spider will ask Windows for a higher priority while scanning files. This will cause Spider to complete its task sooner, but with noticeable impact on system responsiveness- When Finished
- Exit: Spider will exit immediately after it is done scanning files.
Restore window: Spider will restore itself from a minimized state when it is done scanning files.
View Log: Spider will spawn the log viewer and display its scan results when it is done scanning files.
![]()
Files tab —> Directory tab
- All local drives:
- tells Spider to systematically process all local drives, starting with the root directory
- - or - (if you click the All local drives box, the option below will gray out)
- Start dir:
- tells Spider to start processing with the selected directory, initially the C drive. To specify the drive or directory of your choice, you can either type in the white box, or click the Start dir... button, browse, and then click OK.
- Paths to skip
- Lists folders that you do not need Spider to scan. This button opens a new window, where you click Add New for each path you want Spider to skip.
- Recursively process subfolders:
- tells Spider to descend into directories it finds and process the files there. This is Spider's standard behavior and is recommended in most circumstances.
![]()
Files tab —> Types tab
- All accessible files
- tells Spider to process any file it can successfully read. This is the default.
- - or -
- View/Edit
- allows you to specify file extensions that should explicitly be scanned or ignored.
- Uncheck the "All accessible files" box to make the "View/Edit" button clickable.
- Click the View/Edit button.
- You will see two lists, "File Extensions to Scan" and "File Extensions to Skip." Click the Add button next to either list.
- A popup window directs you to enter a file extension with no punctuation; for example, TIFF with no period. Do so and click OK.
- Repeat for all the filetypes you wish to specify.
- Click Save when done.
![]()
Files tab —> Options tab
- Scan whole file
- tells Spider to process each file from beginning to end.
- Scan only the first XX KB
- tells Spider to move on to the next file if no matching patterns are found in the first XX kilobytes of the current file.
- Scan alternate data streams
- tells Spider to search each file for NTFS alternate data streams and scan those. Alternate data streams can be attached to normal files but hidden, which is a security risk.
- Read Excel files with OLE
- tells Spider to try to scan Excel files that contain content embedded from another program with Object Linking and Embedding (OLE). This does not affect most Excel files.
Regexes tab
This window tells Spider which patterns to search for in each file scanned. You can select any or all of the three checkboxes shown; these correspond to pre-compiled regular expressions that match Social Security numbers and two types of credit card numbers. You can also add your own regular expressions.
Spider processes all the regular expressions in the order shown in this window, starting with the pre-compiled regular expressions, then the user-supplied ones. (The pre-compiled expressions are considerably faster than user-supplied ones, which are compiled at run-time.) The first successful match found in a file causes Spider to cease processing the file, close it, and add it to the list in the log file.
To add your own regular expressions, click the Add regex button. A sample is shown below.
![]()
- In the Test Data box, you may optionally enter a sample of the type of number you want Spider to search for, and click the Test button to see whether Spider finds it.
- The Luhn checksum validator uses a standard algorithm for distinguishing actual credit card numbers from similar strings of numbers.
- The SSN area/max group validator distinguishes Social Security numbers from similar strings.
![]()
Logging tab —> Local tab
See note below about importing CSV files into another application.
Write a local log file create a log file on this machine, at the location specified Append to log add the new log to the end of an existing log Write CSV log file create a comma-delimited log file containing the specified information about each file: Footer... text to append to the log file Path full drive and path to the file Hash MD5 hash of the file, useful for identifying identical files stored with different names or paths MIME type MIME type of the file (application/ms-excel, etc.) Size size of the file, in bytes File Type file type: Archive, Normal, Compressed, Hidden, System, etc. Create time creation time of the file Access time access time of the file; more often than not, this is the time Spider opened the file Modify time last modification time of the file Regex the regular expression matched: SSN for Social Security number, VMCD or AMEX for credit card number Match fragment the section of the file that matched one of the regular expressions sought. Match fragments can only be written in an encrypted CSV file. Total matches total number of matches found; this must be selected if you want a File Score File Score a number between 1 and 0 indicating the probability that the file contains a valid match Encrypt log file Spider will prompt for a password (at least 16 characters in length) and use that to encrypt the local log file. Log file encryption is incompatible with unattended operation (see below) ![]()
Logging tab —> Event Logging tab
- Log to Windows Event Logger
- Spider will send matches and, optionally, progress updates to a new event log named "SpiderLog"
- Send progress updates to event logger
- Spider will periodically report on its progress
![]()
Logging tab —> UNIX syslog tab
Send to UNIX syslog Spider will send standard syslog messages to a UNIX loghost (514/UDP outbound may need to be opened on any local firewalls) Log host hostname of the UNIX system that will receive logs Log Facility local0 through local7; select the log facility Spider will use when reporting. Log priority defaults to LOG_INFO
Command-line options:
Spider can be run from the Windows task scheduler and will operate as through the GUI, except that it will automatically exit when finished. Log results can then be collected and analyzed. The command syntax is as follows:
c:\path\to\spider\spider.exe /options
and the options are:
- /R: [path or file]
- causes Spider to assume the supplied path is its start directory and recursively scan it. If provided a file instead, Spider assumes it contains a list of paths, one to a line, that it should sequentially process.
- /D: [path or file]
- causes Spider to process the supplied path non-recursively, i.e., wthout descending into subdirectories. If supplied a file, Spider will scan that file and quit.
- /run
- Spider will start and run unattended.
- /L: [path]
- write the log file to the specified path. This allows Spider to take advantage of Windows environment variables to create machine-specific log paths, for example:
c:\path\to\spider\spider.exe /L: z:\%COMPUTERNAME%.log
will create Spider logs based on the system's host name.
Spider will accept conventional Windows paths (c:\foo) or UNC paths (\\PC1\C$). Running from the shell also accepts Windows environment variables like %COMPUTERNAME%.
Notes
Importing CSV files into another application:
- In the Spider toolbar, select File -> View Log.
- Select the log file to view and click Open.
- If the file is encrypted, supply its password when prompted.
- When the results are displayed, select Copy to Clipboard.
- Paste those results into Notepad and save as a file with the extension .csv.
- Excel should be able to import the file, using the first line as column headers.
- Delete or, ideally, wipe the CSV file and store the Excel spreadsheet safely.
File server scanning:
Spider can scan a mapped network drive just as it would a local drive. It will be forced to skip any file it does not have the necessary rights to open, or any file that is open and locked by another application.
Compatibility with Spider for Linux:
Spider for Linux is a useful tool for security incident response in both Windows and UNIX environments. It can scan a mounted Windows partition, and can handle any file type including system files that the Windows version cannot process. This method provides extra visibility for systems that can afford the downtime necessary to boot a Helix CD.
Running the Linux version under Helix is forensically sound, if the partition to be examined is mounted read-only. The Windows version is *not* forensically sound, as it will update access times on all the files it touches. Therefore, we recommend it for the auditing role and Linux Spider for the incident response role.

