Cornell

 

Office of
Information
Technologies

IT Architecture Initiative

 

OIT Home

Administration and Finance

Human Resources and
Organizational Development

Distributed Support

Advanced Technology
and Architecture

IT Architecture
Initiative (archive)

IT Policy Office

IT Security Office

Strategic Programs

OIT Outreach Program


Cornell University

Cornell University
Finance & Administration
(CUFA)

Cornell Information
Technologies (CIT)

Computing
at Cornell

 

As part of the IT Architecture Initiative, the Office of Information Technologies (OIT) is producing a series of papers outlining directions in information technology architecture. In the spirit of RFCs (requests for comments), the papers are written to foster understanding and to open dialogue about information technology trends at Cornell, with the goal of improving the use of information technology services throughout Cornell. This paper and the others in this series can be found on the IT Architecture Initiative web site (www.cit.cornell.edu/oit/Arch-Init/papers.html).

This paper was prepared in conjunction with the OIT policy advisor's office.   


pdf version (for printing)

Can Cornell Read My E-mail? Are My Deleted Files GONE gone? The Facts About Data Access and Retention at Cornell

Prepared by R. David Vernon, Tracy Mitrano, and Marcia Poulsen
10/02

SYNOPSIS

This paper reviews the ability of Cornell information technology service providers to access information transmitted or stored on Cornell systems. It also touches on related laws and policies. The sections are as follows:

Realities of the electronic age

Is Cornell reading my e-mail?

What information about me does Cornell save?

What about encryption?

Does Cornell sell information about me?

Action items

Closing thoughts


REALITIES OF THE ELECTRONIC AGE

The intimacy people feel with their computers contrasts sharply with the fact that network operators can see electronic communications, governments with proper authorization can intercept transmissions or obtain stored data, and snoops or hackers can all too easily intercept communications or invade an individual's computer. For some who have expressed emotions or political thoughts in e-mail, it can be a shock to learn that their messages have been posted on the web or widely circulated as the result of easy forwarding. Electronic diaries and wills have been sent out as documents as the result of a computer virus. The compromise of a credit card or social security number produces obvious credit problems. The potential for harassing or defamatory messages to be put on the web for the world to see is sobering.

Given such realities, it's not surprising that Cornell's OIT policy officer is often asked--in the same breath with the word "privacy"--questions like these:

1. Technically, can Cornell system administrators read my e-mail?

2. DO they read it?

3. When I delete a file from a central computer (server), is it GONE gone?

4. If not, who keeps it, what do they keep, why, where, and for how long?

5. What web-related data can system administrators see, and what can they infer from it?

6. Does Cornell monitor the content of transactions on its network?

 

Here are the "quick" answers--which show that there are no quick answers:

1. Yes... unless you encrypt it.

2. No... with rare exceptions.

3. Probably not.

4. Depends, depends, depends, depends, and depends.

5. Perhaps more, or perhaps less, than you might guess.

6. Only under extreme circumstances.

For fuller answers to these and similar questions, read on. Please keep in mind that, given the distributed nature of network computing at Cornell, terms like "system operator" and "network administrator" may apply to Cornell Information Technologies (CIT) employees, or to employees within Cornell's individual academic or administrative departments, or to all. In some cases these groups share common policies and practices; in other cases they don't. (To see if local practices differ, check with your local service provider.)

Please also keep in mind that, in the opinion of the principal authors of this paper, CIT system and network operators take pains to protect the "privacy" of computer users in the Cornell community. Still, some concerns remain to be addressed. It is our hope that what you read here will encourage you to engage in informed discussions, moving the campus forward in its collective understanding of, and approach to, these issues.

Note: Because the term "privacy" is ambiguous and potentially misleading in the context of this discussion, this paper will generally avoid it in favor of language more specifically related to data access and retention. For a broader discussion, see "Does the Postman Read Twice: 'Privacy' Considerations of Law and Policy of Electronic Communications at Cornell University."

 

IS CORNELL READING MY E-MAIL?

Let's start with this common concern: Is anyone out there reading my e-mail? Given the nature of the technology, we must accept that Cornell's network operators, like their counterparts elsewhere, have the ability to see most e-mail messages. In fact, they do occasionally see such communications while performing necessary jobs.

A couple of analogies illustrate the transparency of unencrypted data. E-mail messages have been compared to postcards; any postal worker could read what's written on a postcard. More to the point, e-mail is like a telephone conversation; phone company operators or technicians can and sometimes do break into live communications in the course of their duties.

It's important, though, to distinguish between what Cornell system operators can see and what they do see. Sheer volume of e-mail renders the routine reading of messages impossible. In fiscal year 2001-2002 alone, 358 million messages were routed through CU networks. But volume is not the only impediment. OIT/CIT employees and contractors sign an agreement requiring the confidentiality of all data to which the person has access. Even if system or network operators happen to observe an e-mail message in the course of standard business procedures, they are obliged to maintain its confidentiality. (Of course, if they observe material that they reasonably believe violates law or policy, they are obliged, under the university's Responsible Use of Electronic Communications policy, to report that violation to the Office of Judicial Administration, the Cornell police, or the OIT policy advisor.)

In addition to this protection, the law may offer some safeguards as well, although a firm conclusion is less than clear. The Electronic Communications Privacy Act (ECPA) criminalizes the disclosure of electronic communications except in the case of authorized legal papers, in the case of one party's consent, or in the case of an emergency defined as immediate danger to life and limb. This law, however, applies only to "public communications," not to private networks. Case law has yet to decide whether ECPA applies to colleges and universities or to specific constituencies or mailings, such as students or list services that go out to the public. To date, employment law grants no privacy protection for employee users of private networks.

Cornell University "raises the bar" a little higher. University policy 5.1 on the Responsible Use of Electronic Communications states that it is not the practice of the university to monitor the content of an individual user's communications. Monitoring occurs only in the event of a reasonable suspicion of a legal or policy violation.

Concerning access to information technology resources that transmit or store individual users' data, OIT/CIT practices provide still another level of protection. Access is made available only if permission is granted by the constituent head: the vice president of Student Affairs and Academic Services if a student, or the vice president of Human Resources if an employee. If the request is about a member of the faculty, the vice president of Human Resources informs the dean of faculty of the request. The OIT/CIT policy office has received approval from the Executive Policy Review Group to draft a university policy that reflects these practices and would apply to all university computing. These practices, and draft policies, offer significantly more dignity to employee users than is provided for by law.

In short, CIT employees do not routinely read your e-mail. (To see if local practices differ, check with your local service provider.) Reasonable suspicion of legal or policy violations may require individual monitoring; the decision to do so is decided on a case-by-case basis. OIT/CIT practice for its computers and resources is to escalate that decision to the appropriate vice president. University policy is being drafted that would extend that practice to all Cornell University network resources.

 

WHAT INFORMATION ABOUT ME DOES CORNELL SAVE?

Another common question is, what kind of data is Cornell saving… and who's saving it, why, where, and for how long? No laws, and only limited Cornell policy guidelines for administrative records, set requirements for maintaining logs or backups. However, most Cornell providers, do keep logs and backups. To understand what network and computer information is kept and why and by whom, etc., you need to understand the basic operation of these systems (as explained in the following section), and you need to see a model (as illustrated in the section after that).

Basic Systems Operation

In order for Cornell's network electronics to direct data from one computer to another, each computer on the network is assigned an identification code, or IP (Internet Protocol) address. The wide area network (WAN) electronics process and log these addresses while exchanging e-mail messages, raw data, or other information between computers. What's relevant for this discussion is that the source and destination addresses for communications that leave campus are part of the data that Cornell routinely keeps, showing which computers connected to each other at what time and for how long.

Applications running on Cornell's central systems also track user information. These logs are saved for extended periods of time for future reference to resolve system problems or to track down unauthorized use. In addition, information on these central computer systems is backed up to allow for data recovery in case of a hardware failure.

The main point here is that many forms of electronic communication, like e-mail, can be retained and retrieved at a later time, even if users believe they have deleted this information. While this information is usually maintained for legitimate system administration reasons, these processes can understandably make people nervous.

Data-Retention (Log and Backup) Model

The following model is for general reference only. Process and access control vary from one software application or hardware system to another according to the requirements or judgment of the system owners and administrators.

 


 

 

Type of data retained

How long retained & why

Text Box: A

Router transaction logs include the unique numbers of each source and destination computer for off-campus communications, port numbers accessed on those computers, and traffic volume data.

CIT typically keeps these logs 6 months (plus EZ-Backup life if applicable). Main purposes are troubleshooting, capacity planning, security investigations, cost allocation.

Text Box: B

Server transaction logs include such data as who logged on to a given server and when, which logon attempts failed, and other flagged activity (no actual messages or file content).

CIT typically keeps these logs 1 year (plus EZ-Backup life if applicable). Main purposes are troubleshooting, capacity planning, security investigations, cost allocation.

Text Box: C

Cornell-application "owners"--Bursar, Human Resources, etc.--decide what information to keep on their own departments' servers, including amount of detail (granularity) about users.

The application owners decide how long to keep this information and for what purposes.

Text Box: D

Source-system owners decide whether or not to use a backup service, and, if so, what to back up. Your own settings--for example, how long to leave your e-mail on the CIT server--help determine what data is on the server and available to backup. You can even use EZ-Backup with your own computer.

Each source-system owner decides how long to store these backups. For example, CIT currently stores e-mail backups 7 days. Other data may be stored longer. Main purpose is disaster recovery, so these backups are usually relatively short-term and temporary.

 

The following descriptions for boxes A through D expand on CIT's retention practices for the various types of data shown in the preceding model.

Text Box: ARouter transaction logs. As standard practice, CIT system administrators configure WAN routers to keep logs of users' network transactions that leave Cornell. These logs do not contain actual content, like web site addresses or e-mail messages. However, the data they do contain could, theoretically, allow administrators to infer the nature of any given transaction. While it would be unusual for system administrators to do so, they could potentially learn a lot about a person's activity just from these two types of logged data:

IP (Internet Protocol) numbers, which are unique to individual computers on a network

port numbers accessed on each of those computers

A computer's "ports" are like doors. Just as a bathroom door leads to a bathroom or a kitchen door to a kitchen, port 25 leads to e-mail, port 80 to web data, and so on. Computers can have hundreds of ports, each providing access to a unique service. (For a complete list of port mappings, see Cornell's list of generic "services" ports.) Since many ports have standardized uses, together the IP and port numbers often reveal the nature of a transaction. For example, a system administrator reviewing the WAN router transaction logs could use IP and port numbers to infer the general nature of a web site--whether research or recreational--accessed by a given user at a given time. While these logs do not track specific navigation (exact web pages or screens accessed), the IP/port pair could fairly accurately imply the type of information being sought or transmitted.

Text Box: BServer transaction logs. Each server also saves transaction data, including who logged on to that server and when, which logon attempts failed, and other flagged activity. Like the router transaction logs, server transaction logs contain no messages or file content.

Text Box: CCopies of application data. "Owners" of the hosted applications can copy application data from the servers to their departmental computers for their own business uses. CIT is not involved in deciding what they save or how long they keep it.

Text Box: DSystem and/or file backups. Any user of Cornell's network, including you, has the option to use backup services like EZ-Backup. In fact, all users are encouraged to create regular backups in case of a system crash or other loss of data. CIT backs up all router logs, server logs, and server data every night.

In short, although CIT system administrators retain copious data, they do not regularly monitor the content or nature of this information, whether it is in active transmission or stored. (To see if local practices differ, check with your local service provider.)

Note: Cornell provides no universal guidelines for what data to keep, where to keep it, or how long to keep it, so backup life spans and system log processes vary across campus. These various data-retention practices are not currently tracked or documented.

 

WHAT ABOUT ENCRYPTION?

Although few users currently do, anyone can use special software to "encrypt" files, requiring others to use the appropriate electronic "key" to decipher the content. In fact, encryption is the only way to restrict access to your transaction content, such as e-mail messages or attachments.

Content encryption does have limitations. First, it does not obfuscate IP or port numbers. Also, many applications do not support encryption. What's more, application servers that support encryption may allow central control of the encryption keys, giving system administrators the ability to unencrypt the data if necessary.

A policy addressing the encryption of university records or institutional data is in progress. This policy will not require anyone to use encryption, but it will require those who encrypt university records or institutional data to create their own departmental encryption policies and to create procedures for the appropriate escrow of electronic keys. Moreover, the president; the provost; and the heads of units, colleges, and departments may demand the unencryption of university records or institutional data on university-owned computers. These officials are the final arbiters of what constitutes such records or data.

Students not working with university records or institutional data are free to encrypt anything they like, to the extent that they transmit or store "personal" data on university-owned computers, as long as they comply with related laws and policies.

In short, if you want to restrict people's ability to decipher your electronic transactions, you must encrypt your data. If you are working with university records or institutional data, you must have a sound policy for the escrow of encryption keys.

Note: You can find lot of information about encryption on the web. One popular site that offers free downloadable encryption software is the "MIT Distribution Center for PGP (Pretty Good Privacy)."

 

DOES CORNELL SELL INFORMATION ABOUT ME?

Here's one answer that's simple: CIT does not sell e-mail addresses or any other form of electronic communications or directory data to third parties. (To see if local practices differ, check with your local service provider.) Any "spam" or unwanted e-mail you may receive is emphatically not the result of your address being sold but of the technical ability of marketers to "harvest" data from publicly available directories.

 

ACTION ITEMS

Toward the goal of a more consistent approach to, and understanding of, data access and retention across campus, we suggest the following action items:

Stakeholders should review current practices regarding data access and retention for possible revision or statement of purpose.

Cornell should expedite a policy outlining broad classifications for electronic information and updated campus-wide guidelines for data access and retention. (See the current policy, Retention of University Records.)

All users should be made aware of the federal and state laws, university policies, and departmental practices related to data access and retention.

All users should be encouraged to use encryption technologies when appropriate to protect transactions from unauthorized access.

Departments should periodically review local processes to assure compliance with evolving Cornell polices and changing laws.

 

CLOSING THOUGHTS

Given that faculty, staff and student privacy is impacted by the nature of information technology, and by Cornell's right to access information on its equipment, Cornell is systematically refining campus-wide policy on retention and access to data captured on university systems.

The decentralized control of information technology resources, without policy guidance, implies that access to data may be inconsistent across campus. Users are encouraged to ask their local service providers about their procedures; do not assume that the procedures used by CIT are the same as other departments. Ironically, many system administrators may sigh in relief when given clear university guidelines addressing the above concerns. The current ad hoc process places them in the untenable position of holding a wealth of data that many consider personal and private, without the benefit of Cornell guidance for its authorized release.

Finally, all users need to understand that the only true means of securing data transmission--whether on or off campus--is encryption. (Of course, if you elect to encrypt university information, remember that you are obligated to assure a process to retrieve that data in the future.) You may never know if a particular file is GONE gone, but with encryption, you can at least hold the key to understanding the data.

 

Contact for information architecture questions: rdv2@cornell.edu
Contact for law and policy questions: it-policies@cornell.edu

Last modified: Thursday, 31-May-2007 18:02:04 EDT

Return to list of IT Architecture Initiative papers