As part
of the IT Architecture Initiative, the Office of Information Technologies
(OIT) is producing a series of papers outlining directions in information
technology architecture. In the spirit of RFCs (requests for comments),
the papers are written to foster understanding and to open dialogue
about information technology trends at Cornell, with the goal of improving
the use of information technology services throughout Cornell. This
paper and the others in this series can be found on the IT
Architecture Initiative web site (www.cit.cornell.edu/oit/Arch-Init/papers.html).
This paper
was prepared in conjunction with the OIT policy advisor's office.
pdf version (for
printing)
Can Cornell
Read My E-mail? Are My Deleted Files GONE gone? The Facts About Data
Access and Retention at Cornell
Prepared by R.
David Vernon, Tracy Mitrano, and Marcia Poulsen
10/02
SYNOPSIS
This
paper reviews the ability of Cornell information technology service
providers to access information transmitted or stored on Cornell systems.
It also touches on related laws and policies. The sections are as follows:
Realities
of the electronic age
Is
Cornell reading my e-mail?
What
information about me does Cornell save?
What
about encryption?
Does
Cornell sell information about me?
Action
items
Closing
thoughts
REALITIES
OF THE ELECTRONIC AGE
The intimacy
people feel with their computers contrasts sharply with the fact that
network operators can see electronic communications, governments with
proper authorization can intercept transmissions or obtain stored data,
and snoops or hackers can all too easily intercept communications or
invade an individual's computer. For some who have expressed emotions
or political thoughts in e-mail, it can be a shock to learn that their
messages have been posted on the web or widely circulated as the result
of easy forwarding. Electronic diaries and wills have been sent out
as documents as the result of a computer virus. The compromise of a
credit card or social security number produces obvious credit problems.
The potential for harassing or defamatory messages to be put on the
web for the world to see is sobering.
Given such
realities, it's not surprising that Cornell's OIT policy officer is
often asked--in the same breath with the word "privacy"--questions like
these:
1. Technically,
can Cornell system administrators read my e-mail?
2. DO
they read it?
3. When
I delete a file from a central computer (server), is it GONE gone?
4. If
not, who keeps it, what do they keep, why, where, and for how long?
5. What
web-related data can system administrators see, and what can they
infer from it?
6. Does
Cornell monitor the content of transactions on its network?
Here are
the "quick" answers--which show that there are no quick answers:
1. Yes...
unless you encrypt it.
2. No...
with rare exceptions.
3. Probably
not.
4. Depends,
depends, depends, depends, and depends.
5. Perhaps
more, or perhaps less, than you might guess.
6. Only
under extreme circumstances.
For fuller
answers to these and similar questions, read on. Please keep in mind
that, given the distributed nature of network computing at Cornell,
terms like "system operator" and "network administrator" may apply to
Cornell Information Technologies (CIT) employees, or to employees within
Cornell's individual academic or administrative departments, or to all.
In some cases these groups share common policies and practices; in other
cases they don't. (To see if local practices differ, check with your local
service provider.)
Please
also keep in mind that, in the opinion of the principal authors of this
paper, CIT system and network operators take pains to protect the "privacy"
of computer users in the Cornell community. Still, some concerns remain
to be addressed. It is our hope that what you read here will encourage
you to engage in informed discussions, moving the campus forward in
its collective understanding of, and approach to, these issues.
Note:
Because the term "privacy" is ambiguous and potentially misleading in
the context of this discussion, this paper will generally avoid it in
favor of language more specifically related to data access and retention.
For a broader discussion, see "Does
the Postman Read Twice: 'Privacy' Considerations of Law and Policy of
Electronic Communications at Cornell University."
IS
CORNELL READING MY E-MAIL?
Let's
start with this common concern: Is anyone out there reading my e-mail?
Given the nature of the technology, we must accept that Cornell's
network operators, like their counterparts elsewhere, have the ability
to see most e-mail messages. In fact, they do occasionally see such
communications while performing necessary jobs.
A couple
of analogies illustrate the transparency of unencrypted data. E-mail
messages have been compared to postcards; any postal worker could read
what's written on a postcard. More to the point, e-mail is like
a telephone conversation; phone company operators or technicians can
and sometimes do break into live communications in the course of their
duties.
It's
important, though, to distinguish between what Cornell system operators
can see and what they do see. Sheer volume of e-mail renders
the routine reading of messages impossible. In fiscal year 2001-2002
alone, 358 million messages were routed through CU networks.
But volume is not the only impediment. OIT/CIT employees and contractors
sign an agreement requiring the confidentiality of all data to which
the person has access. Even if system or network operators happen to observe an e-mail message in the course of standard business procedures, they are obliged to maintain its confidentiality. (Of course, if they observe material that they reasonably believe violates law or policy, they are obliged, under the university's Responsible
Use of Electronic Communications policy, to report that violation to the Office of Judicial Administration, the Cornell police, or the OIT policy advisor.)
In addition
to this protection, the law may offer some safeguards as well, although
a firm conclusion is less than clear. The Electronic Communications
Privacy Act (ECPA) criminalizes the disclosure of electronic communications
except in the case of authorized legal papers, in the case of one party's
consent, or in the case of an emergency defined as immediate danger
to life and limb. This law, however, applies only to "public communications,"
not to private networks. Case law has yet to decide whether ECPA applies
to colleges and universities or to specific constituencies or mailings,
such as students or list services that go out to the public. To date,
employment law grants no privacy protection for employee users of private
networks.
Cornell
University "raises the bar" a little higher. University policy
5.1 on the Responsible
Use of Electronic Communications states that it is not the practice
of the university to monitor the content of an individual user's communications.
Monitoring occurs only in the event of a reasonable suspicion of a legal
or policy violation.
Concerning
access to information technology resources that transmit or store individual
users' data, OIT/CIT practices provide still another level of protection.
Access is made available only if permission is granted by the constituent
head: the vice president of Student Affairs and Academic Services if
a student, or the vice president of Human Resources if an employee.
If the request is about a member of the faculty, the vice president
of Human Resources informs the dean of faculty of the request. The OIT/CIT
policy office has received approval from the Executive Policy Review
Group to draft a university policy that reflects these practices and
would apply to all university computing. These practices, and draft
policies, offer significantly more dignity to employee users than is
provided for by law.
In short,
CIT employees do not routinely read your e-mail. (To see if local practices
differ, check with your local service provider.) Reasonable suspicion
of legal or policy violations may require individual monitoring; the
decision to do so is decided on a case-by-case basis. OIT/CIT practice
for its computers and resources is to escalate that decision to the
appropriate vice president. University policy is being drafted that
would extend that practice to all Cornell University network resources.
WHAT
INFORMATION ABOUT ME DOES CORNELL SAVE?
Another
common question is, what kind of data is Cornell saving
and who's
saving it, why, where, and for how long? No laws, and only limited Cornell policy guidelines for administrative records,
set requirements for maintaining logs or backups. However, most Cornell
providers, do keep logs and backups. To understand what network and
computer information is kept and why and by whom, etc., you need to
understand the basic operation of these systems (as explained in the
following section), and you need to see a model (as illustrated in the
section after that).
Basic
Systems Operation
In order
for Cornell's network electronics to direct data from one computer
to another, each computer on the network is assigned an identification
code, or IP (Internet Protocol) address. The wide area network (WAN)
electronics process and log these addresses while exchanging e-mail
messages, raw data, or other information between computers. What's
relevant for this discussion is that the source and destination addresses
for communications that leave campus are part of the data that Cornell
routinely keeps, showing which computers connected to each other at
what time and for how long.
Applications
running on Cornell's central systems also track user information.
These logs are saved for extended periods of time for future reference
to resolve system problems or to track down unauthorized use. In addition,
information on these central computer systems is backed up to allow
for data recovery in case of a hardware failure.
The main
point here is that many forms of electronic communication, like e-mail,
can be retained and retrieved at a later time, even if users believe
they have deleted this information. While this information is usually
maintained for legitimate system administration reasons, these processes
can understandably make people nervous.
Data-Retention
(Log and Backup) Model
The following
model is for general reference only. Process and access control vary
from one software application or hardware system to another according
to the requirements or judgment of the system owners and administrators.

| |
Type
of data retained
|
How
long retained & why
|
|
|
Router
transaction logs include the unique numbers of each source and
destination computer for off-campus communications, port numbers
accessed on those computers, and traffic volume data.
|
CIT
typically keeps these logs 6 months (plus EZ-Backup life if applicable).
Main purposes are troubleshooting, capacity planning, security
investigations, cost allocation.
|
|
|
Server
transaction logs include such data as who logged on to a given
server and when, which logon attempts failed, and other flagged
activity (no actual messages or file content).
|
CIT
typically keeps these logs 1 year (plus EZ-Backup life if applicable).
Main purposes are troubleshooting, capacity planning, security
investigations, cost allocation.
|
|
|
Cornell-application
"owners"--Bursar, Human Resources, etc.--decide what
information to keep on their own departments' servers, including
amount of detail (granularity) about users.
|
The
application owners decide how long to keep this information and
for what purposes.
|
|
|
Source-system
owners decide whether or not to use a backup service, and, if
so, what to back up. Your own settings--for example, how
long to leave your e-mail on the CIT server--help determine what
data is on the server and available to backup. You can even use
EZ-Backup with
your own computer.
|
Each
source-system owner decides how long to store these backups. For
example, CIT currently stores e-mail backups 7 days. Other
data may be stored longer. Main purpose is disaster recovery,
so these backups are usually relatively short-term and temporary.
|
The
following descriptions for boxes A through D expand on CIT's retention
practices for the various types of data shown in the preceding model.
Router
transaction logs. As standard practice, CIT system administrators
configure WAN routers to keep logs of users' network transactions
that leave Cornell. These logs do not contain actual content, like web
site addresses or e-mail messages. However, the data they do contain
could, theoretically, allow administrators to infer the nature of any
given transaction. While it would be unusual for system administrators
to do so, they could potentially learn a lot about a person's activity
just from these two types of logged data:
IP (Internet
Protocol) numbers, which are unique to individual computers on a network
port numbers
accessed on each of those computers
A computer's
"ports" are like doors. Just as a bathroom door leads to a
bathroom or a kitchen door to a kitchen, port 25 leads to e-mail, port
80 to web data, and so on. Computers can have hundreds of ports, each
providing access to a unique service. (For a complete list of port mappings,
see Cornell's list of generic
"services" ports.) Since many ports have standardized
uses, together the IP and port numbers often reveal the nature of a
transaction. For example, a system administrator reviewing the WAN router
transaction logs could use IP and port numbers to infer the general
nature of a web site--whether research or recreational--accessed by
a given user at a given time. While these logs do not track specific
navigation (exact web pages or screens accessed), the IP/port pair could
fairly accurately imply the type of information being sought or transmitted.
Server
transaction logs. Each server also saves transaction data, including
who logged on to that server and when, which logon attempts failed,
and other flagged activity. Like the router transaction logs, server
transaction logs contain no messages or file content.
Copies
of application data. "Owners" of the hosted applications
can copy application data from the servers to their departmental computers
for their own business uses. CIT is not involved in deciding what they
save or how long they keep it.
System
and/or file backups. Any user of Cornell's network, including
you, has the option to use backup services like EZ-Backup. In fact,
all users are encouraged to create regular backups in case of a system
crash or other loss of data. CIT backs up all router logs, server logs,
and server data every night.
In short,
although CIT system administrators retain copious data, they do not
regularly monitor the content or nature of this information, whether
it is in active transmission or stored. (To see if local practices differ,
check with your local service provider.)
Note:
Cornell provides no universal guidelines for what data to keep,
where to keep it, or how long to keep it, so backup life spans and system
log processes vary across campus. These various data-retention practices
are not currently tracked or documented.
WHAT
ABOUT ENCRYPTION?
Although
few users currently do, anyone can use special software to "encrypt"
files, requiring others to use the appropriate electronic "key"
to decipher the content. In fact, encryption is the only way to restrict
access to your transaction content, such as e-mail messages or attachments.
Content
encryption does have limitations. First, it does not obfuscate IP or
port numbers. Also, many applications do not support encryption. What's
more, application servers that support encryption may allow central
control of the encryption keys, giving system administrators the ability
to unencrypt the data if necessary.
A policy
addressing the encryption of university records or institutional data
is in progress. This policy will not require anyone to use encryption,
but it will require those who encrypt university records or institutional
data to create their own departmental encryption policies and to create
procedures for the appropriate escrow of electronic keys. Moreover,
the president; the provost; and the heads of units, colleges, and departments
may demand the unencryption of university records or institutional data
on university-owned computers. These officials are the final arbiters
of what constitutes such records or data.
Students
not working with university records or institutional data are free to
encrypt anything they like, to the extent that they transmit or store
"personal" data on university-owned computers, as long as
they comply with related laws and policies.
In short,
if you want to restrict people's ability to decipher your electronic
transactions, you must encrypt your data. If you are working with university
records or institutional data, you must have a sound policy for the
escrow of encryption keys.
Note: You can find lot of information about encryption on the
web. One popular site that offers free downloadable encryption software
is the "MIT Distribution
Center for PGP (Pretty Good Privacy)."
DOES
CORNELL SELL INFORMATION ABOUT ME?
Here's
one answer that's simple: CIT does not sell e-mail addresses or
any other form of electronic communications or directory data to third
parties. (To see if local practices differ, check with your local service
provider.) Any "spam" or unwanted e-mail you may receive is
emphatically not the result of your address being sold but of the technical
ability of marketers to "harvest" data from publicly available
directories.
ACTION
ITEMS
Toward
the goal of a more consistent approach to, and understanding of, data
access and retention across campus, we suggest the following action
items:
Stakeholders
should review current practices regarding data access and retention
for possible revision or statement of purpose.
Cornell
should expedite a policy outlining broad classifications for electronic
information and updated campus-wide guidelines for data access and
retention. (See the current policy, Retention
of University Records.)
All users
should be made aware of the federal and state laws, university policies,
and departmental practices related to data access and retention.
All users
should be encouraged to use encryption technologies when appropriate
to protect transactions from unauthorized access.
Departments
should periodically review local processes to assure compliance with
evolving Cornell polices and changing laws.
CLOSING THOUGHTS
Given that
faculty, staff and student privacy is impacted by the nature of information
technology, and by Cornell's right to access information on its
equipment, Cornell is systematically refining campus-wide policy on
retention and access to data captured on university systems.
The decentralized
control of information technology resources, without policy guidance,
implies that access to data may be inconsistent across campus. Users
are encouraged to ask their local service providers about their procedures;
do not assume that the procedures used by CIT are the same as other
departments. Ironically, many system administrators may sigh in relief
when given clear university guidelines addressing the above concerns.
The current ad hoc process places them in the untenable position of
holding a wealth of data that many consider personal and private, without
the benefit of Cornell guidance for its authorized release.
Finally, all
users need to understand that the only true means of securing data transmission--whether
on or off campus--is encryption. (Of course, if you elect to encrypt university
information, remember that you are obligated to assure a process to retrieve
that data in the future.) You may never know if a particular file is GONE
gone, but with encryption, you can at least hold the key to understanding
the data.
Contact for information architecture questions: rdv2@cornell.edu
Contact for law and policy questions: it-policies@cornell.edu
Last modified:
Thursday, 31-May-2007 18:02:04 EDT
Return
to list of IT Architecture Initiative papers