Skip to main content

more options

Data Discovery Software

Data discovery software is used to search a computer for sensitive information, including some of the types of data Cornell has classified as confidential, such as social security numbers and credit card numbers.

A data discovery software program examines the documents on a computer's hard drive, or a specified portion of the hard drive, looking for possible instances of sensitive information.  You can also scan removable or external drives and network shares. Depending on how much material is on the disk, and how powerful the computer is, this scanning process can take a significant amount of time. Once the scan has completed, you will be able to review the results.

For each instance of what might be sensitive data, the data discovery software shows you where it was found and gives you the choice of:

  • Securely deleting (shredding) the file in which it appeared, or editing out (redacting) the sensitive data;
  • Setting the file aside in a special location; or
  • Taking no action.

Data discovery software available at Cornell

Spider is a software package developed here at Cornell, and is also used at a number of other institutions.  The current version, Spider2008, is for computers running Microsoft Windows.  Older versions are also available for Mac OS X and Unix.

More information about Spider

Download Spider2008

Spider for Mac OS X and for Unix

Identity Finder is commercial software that has been licensed by some campus departments.  Originally just for Microsoft Windows, the current release also includes a version for the Macintosh.  To find out whether or not you are eligible to use Identity Finder, check with your local IT support.

More information about Identity Finder

Limitations of data discovery software

  • Although these software packages use a complex set rules to determine whether or not a string of numbers could be, say, a credit card number, they will inevitably make mistakes.  When the software erroneously identifies something as sensitive data, this is called a “false positive.”
  • Sometimes the software will miss an instance of sensitive data, producing a “false negative.”  This is commonly the case when something like a social security number appears in the middle of a long string of characters, with nothing to indicate where it starts and ends.
  • Data discovery software will not be able to find sensitive information imbedded in an image, like the account number in a scan of your bank statement.
  • Some packages will not be able to find certain types of data.  Identity Finder comes configured to look for bank account numbers but Spider does not.
  • The software may not be able to read certain types of files, or may not do so by default.  Spider comes ready to scan FileMaker databases but this has to be explicitly enabled in Identity Finder.

While data discovery software can be a very useful tool for finding sensitive data, you cannot necessarily expect it to find everything.  In the end, you yourself bear the responsibility of knowing what types of information are on your computer.

What to do when the scan finds confidential data

University Policy 5.10, Security of Electronic University Administrative Information sets forth specific requirements for the treatment of data classified as confidential.  Elsewhere on this site, you can find less technical summaries of how to secure a computer holding such information, including when the data must be encrypted, and other safeguards for handling sensitive data.

In general, the less confidential data you have on your computer, the better, and best of all is to not store confidential data there at all.  Whenever you are holding confidential data, you need to be worried that it might be exposed if your computer is infected with malicious software designed to steal valuable information, or if the computer itself is stolen.

Local practices

Your college, unit or department may have developed local practices for handling confidential data.  These practices can go beyond what is required by policy.  A department may require that any computer with confidential data use encryption, and not just portable devices.  Another possible approach is to forbid any storage of confidential data on staff desktops and laptops.

If you are at all uncertain what to do when you find confidential information on your computer, please consult with your supervisor or someone else in local administration.

Removing data that no longer needs to be on your computer

When there is confidential data on your computer that you no longer need for your daily work, you must remove it.  This is a policy requirement.

Before you simply delete the file, consider whether it is the sole source for this data, whether it contains information that might be needed again.  If so, your department may have a secure file server or some other electronic archive it uses for such material.  

Another approach is to copy the files with sensitive data to a CD or DVD, and then delete them from your computer.  The CD or DVD should then be stored in a secure, locked location.

Encryption

If you need to keep a file with sensitive data on your computer, you will have to determine whether policy or local practice requires that you encrypt it.  Before you encrypt university data, check with local IT personnel to find out what options your department supports.

When you encrypt data, Policy 5.3, Use of Escrowed Encryption Keys requires that you store the password with a designated authority in your department.  This is to ensure that the university can still gain access to this data even if you have left or are otherwise unavailable when the information is needed.