Keywords
SEER - SEER-Medicare - outcomes research - colon cancer - rectal cancer
The Surveillance, Epidemiology, and End Results (SEER) program, a clinical database
funded by the National Cancer Institute (NCI), collects data on cancer incidence and
survival from U.S. cancer registries. Case ascertainment and data collection originally
began on January 1, 1973, as a sequel from two earlier NCI programs: the End Results
Program and the Third National Cancer Survey.[1]
[2] During this initial stage, data was collected from the states of Connecticut, Iowa,
New Mexico, Utah, and Hawaii and the metropolitan areas of Detroit and San Francisco-Oakland.
Between 1974 and 1975, the metropolitan area of Atlanta and the 13-county Seattle-Puget
Sound area were added. These geographic areas are considered the “original nine” SEER
registries. In 1978, ten predominantly African American rural counties in Georgia
were included. Subsequently in 1980, American Indians residing in Arizona were added
to the database. Three additional geographic areas participated in the SEER program
prior to 1990: New Orleans, Louisiana (1974–1977, rejoined in 2001); New Jersey (1979–1989,
rejoined in 2001); and Puerto Rico (1973–1989).[3]
In 1992, the SEER Program was expanded to increase coverage of minority populations,
particularly Hispanics. Los Angeles County and four counties in the San Jose-Monterey
area south of San Francisco were added. In 2001, the SEER Program expanded coverage
to include Kentucky and the remaining counties in California. Additionally, at this
time, New Jersey and Louisiana rejoined the registry. In 2010, the SEER program expanded
coverage to include the entire state of Georgia.
Currently, SEER collects and publishes cancer incidence and survival data from 17
population-based cancer registries covering approximately 30% of the U.S. population.[4] The database is broadly representative of the U.S. population. However, due to the
limited geographic areas of the registries, there is a higher relative proportion
of certain populations included in the SEER registry as compared with Caucasian and
African Americans as shown in [Table 1].[5] Furthermore, the SEER population tends to have a higher proportion of foreign-born
individuals (17.9%) as compared with the general U.S. population (12.8%).[3]
Table 1
Proportions in the overall U.S. population included in SEER Registry
|
SEER (%)
|
Native Hawaiian/Pacific Islander
|
69.8
|
Asian
|
53.3
|
American Indian/Alaska Native
|
42.2
|
Hispanic
|
40.4
|
Caucasian
|
23.4
|
African American
|
22.7
|
Abbreviation: SEER, Surveillance, Epidemiology, and End Results.
Data Collection
SEER routinely collects and publishes data on patient-specific and tumor-specific
characteristics. Information collected for each case includes patient demographics,
primary tumor site, tumor morphology, stage at diagnosis, treatment course, follow-up
for vital status, and cause of death. A complete list of variables is described in
[Table 2]. The SEER registry contains information on 9 million cancer cases with over 470,000
new cases added to the database each year. SEER uses the Population Estimates Program
data of the United States Census Bureau and U.S. mortality data, collected and maintained
by the National Center for Health Statistics, for population counts.[3]
Table 2
Review of variables included in SEER Database
Data
|
Variable
|
Description
|
Registry
|
Registry ID
|
Unique identifier
|
Type of reporting source
|
Where information came from, including autopsy and death certificate
|
Location
|
State and county at diagnosis
|
Patient
|
Patient ID number
|
Unique identifier
|
Demographics
|
Age, sex, race/ethnicity, Hispanic origin
|
Year of birth
|
|
Marital status
|
At the time of diagnosis
|
Tumor
|
Primary site
|
ICD-O-3 topography code
|
Date of diagnosis
|
Month, year
|
Tumor markers
|
Specific to malignancy
|
Sequence
|
Specifies if first malignancy and sequence number of reported malignancy
|
Biologic characteristics
|
Histology, behavior, grade, laterality, size
|
Extent
|
Extent of disease at the time of diagnosis, lymph node involvement
|
Stage
|
AJCC T, N, M staging and AJCC stage group
|
Treatment
|
Surgery
|
Surgical procedure/site, extent of lymph node dissection
|
Lymph nodes
|
Number of regional nodes examined, number of positive regional nodes
|
Radiation therapy
|
Administration, sequence with surgery, radiation to CNS (yes/no)
|
Outcomes
|
Mortality
|
Date of death, cause of death
|
Source: Adapted from Dictionary of SEER*Stat Variables 2015. For a more complete review
of variables in SEER, please see http://seer.cancer.gov/data/seerstat/nov2015/.
Rigorous quality control measures are in place to ensure integrity of the dataset.[6] Registries are routinely audited for data accuracy and a Data Quality Profile is
produced for each SEER registry. The program performs regular education and training
sessions in coordination with the National Cancer Registrars Association annual meeting
where registrars are tested through Web-based reliability studies.[5] Additionally, audits of high-volume facilities are performed to ensure that cases
are recorded in a complete and timely fashion. NCI staff work closely with the North
American Association of Central Cancer Registries (NAACCR) to monitor all state registries
to ensure accurate recording of data and compatibility. The database is updated annually
and available for download after completion of a data user agreement free of charge:
https://seer.cancer.gov/data/access.html. Given the enormous, complex structure of the database, SEER provides resources to
assist investigators including SEER*Stat, a free statistical software program to ease
analysis of SEER data which includes survival analysis capability.[7] Patient subgroups can then be exported for use in the usual biostatistical software
packages.
Increased detail has been recorded in the SEER database in recent years. As an example,
since its inception in 1973, stage at diagnosis has been classified into five categories:
in situ, localized, regional, distant, or unstaged. Since 2004, TNM staging data have
been recorded based on American Joint Committee on Cancer (AJCC) staging in addition
to Collaborative Staging Codes, providing more granular clinical and pathologic information.[5] Furthermore, details regarding the presence of extracapsular extension, classification
of “fixed” nodes in head and neck cancers, and estrogen/progesterone/HER2 receptor
status for breast cancer have been reported since 2004.
SEER-Medicare
Medicare provides federally funded health insurance for approximately 97% of individuals
aged 65 years or older in the United States.[8] It also provides health insurance to persons younger than 65 years who have end-stage
renal disease or medical disability. All beneficiaries are entitled to Part A coverage,
which includes hospital inpatient care. Upward of 90% of participants pay to subscribe
to Part B coverage, which covers physician and outpatient services.
In a collaborative effort of the NCI, SEER registries, and the Centers for Medicare
and Medicaid Services (CMS), the SEER database has been linked to claims-based measures
of comorbidities, screening and evaluation tests, and detailed treatment and outcomes
data.[9] Beginning in 1991, a matching algorithm was employed to link cancer data on individual
patients available from the SEER registries to a master Medicare enrollment is done
via patient's name, Social Security number, sex, and date of birth. The database has
been subsequently updated in 1995, 1999, 2003, 2006, 2009, 2012, and most recently
in 2014. For each of the linkages, 93% of individuals aged 65 years and older in the
SEER files were matched to a Medicare enrollment file.[10] The SEER-Medicare linkage is slated to be updated biennially.
The SEER data included as part of the SEER-Medicare files are in a customized file
known as the Patient Entitlement and Diagnosis Summary File: https://healthcaredelivery.cancer.gov/seermedicare/. Of note, there is a separate data use agreement and significant per-cancer cost
associated with obtaining these data files. This file contains a record for individuals
in the SEER database who have been matched with Medicare enrollment records. Basic
SEER diagnostic information is available for up to 10 diagnosed cancer cases for each
individual. SEER data including cancer incidence, location, stage, initial treatment,
and vital status are linked with Medicare claims for hospital stays, physician and
laboratory services, hospital outpatient claims, and home health/hospice bills. Census
tract and zip code data are available and can be used to draw conclusions regarding
patient socioeconomic data.[4]
To allow for comparison studies with a control group, data are provided for two cohorts:
patients with cancer and a random 5% sampling of Medicare beneficiaries residing in
SEER areas who do not have cancer.[6] The linked SEER-Medicare database allows for a longitudinal perspective in the study
of cancer care.
Strengths of SEER and SEER-Medicare
Strengths of SEER and SEER-Medicare
In capturing approximately 30% of the U.S. population, the SEER database is a very
powerful research tool. The database is enriched with diverse and immigrant populations.
The large sample size and uniform, high-quality data collected allow for accurate
estimates of national cancer incidence and survival rates. This vast patient population
also allows for specific subset analyses to be performed, based on patient characteristics,
tumor stage, and treatment strategies. The SEER program includes long-term follow-up,
providing researchers the ability to analyze temporal trends. Due to the population-based
nature of the registry, all cancers occurring within a defined geographic region are
required to be collected. This serves to minimize potential biases that may be encountered
in facility-based databases where patient referral patterns can confound analysis,
as patients with more severe disease are commonly referred to highly specialized centers.
The quality control program conducted annually by the NCI is a critical component
to ensuring quality and completeness of the database.
The SEER-Medicare database provides an opportunity to conduct case–control studies
utilizing population-based sampling.[8] Employing this linked database allows one to obtain a near-complete census of all
cancers arising in individuals older than 65 years. Furthermore, the SEER-Medicare
database offers researchers a means of studying the following: cancer control practices
and their effect on the cancer burden; patterns of access to cancer care; impact of
comorbidities, race, geographic, socioeconomic, and provider-related factors on access
to care; diagnosis and treatment outcomes (i.e., cause-specific survival analysis);
and cost-effectiveness of cancer care.[11]
[12]
[13] The database includes information on multiple disease conditions allowing researchers
to adjust for other health conditions and prior care (i.e., multivariate and propensity-score
analysis). Inclusion of a control group that does not have cancer is instrumental
for performing comparison studies.
Weaknesses of SEER and SEER-Medicare
Weaknesses of SEER and SEER-Medicare
While the information provided in the SEER database is valuable to the study of oncologic
disease, there are several shortcomings. SEER provides detailed information about
cancer stage and treatment at the time of diagnosis; however, details regarding completion
of therapy and long-term outcomes other than death are not available. The database
lacks information regarding recurrence or disease progression as well as chemotherapy
use, thus prohibiting researchers from making inferences about these key factors and
their impact on oncological outcomes. Furthermore, the SEER database population is
predominantly Medicare/Medicaid based and tends to have a bias toward older subjects
and among older records.
Limitations of the SEER-Medicare database surround the lack of data on cancer patients
who do not have Medicare (i.e., those individuals younger than 65 years, privately
insured, Medicaid, and the uninsured). It is important to note that Medicare data
do not include the following: claims for HMO (Health Maintainence Organization) enrollees,
care provided in outside settings (Veterans Administration), care for individuals
with Medicare as the secondary payer, out-of-pocket expenditures, and coverage provided
by Medigap policies.[14]
Although cancer cases and controls are thought to be generalizable to the entire U.S.
elderly population, there are two limitations. First, Medicare eligibility depends
on individuals having Social Security benefits or being married to someone with benefits,
which depends on documentation of work history. The proportion of elderly individuals
who do not qualify is small; however, it is likely that the underprivileged and recent
immigrants are overrepresented in this excluded population. Second, SEER areas were
purposely selected to include a relatively large proportion of racial and ethnic minorities.[6]
Since Medicare coverage is predominantly restricted to elderly people, the SEER-Medicare
data cannot be utilized to evaluate risk factors that arise earlier in life (e.g.,
Crohn's disease).[8] Moreover, studies of the elderly in this linked database are likely not generalizable
to younger populations. Limitations of using of SEER-Medicare registry to conduct
case–control studies surround the completeness and accuracy of Medicare claims to
evaluate certain risk factors, such as exposure. Only conditions diagnosed and documented
by a healthcare provider or related procedure are included in the database. For example,
an asymptomatic or undiagnosed medical condition may impact the sensitivity of an
analysis.
Use of SEER and SEER-Medicare in the Study of Colorectal Cancer
Use of SEER and SEER-Medicare in the Study of Colorectal Cancer
Since 1974, thousands of scientific publications have been published using the SEER
and SEER-Medicare databases, leaving no reservations about the immense impact this
registry has on oncologic research. The Annual Report to the Nation on the status
of cancer and Racial and Ethnic Patterns of Cancer in the United States are two vital
statistical reviews produced by SEER.[3] With easy access to the database and accommodating statistical software, an increasing
number of SEER-based publications have been produced over the past decade.
For the study of colorectal cancer specifically, the SEER database distinguishes anatomic
subsites into proximal colon, distal colon, and rectum as categorized according to
the International Classification of Diseases for Oncology, third edition (ICD-0–3)
topography codes (anal cancers are also included, but are beyond the scope of this
article). Much of the initial research in colorectal cancer was epidemiologic in nature.
Screening
The SEER database has been utilized to evaluate the impact of screening in colorectal
cancer. Researchers first employed the database for this purpose in 1990, when they
examined the public health impact of mass media coverage of President Reagan's colon
cancer episode that aired in 1985.[15] They found an increase in incidence of early-stage colorectal cancers in the months
following the President's diagnosis, suggesting a potential screening effect. In 1994,
researchers used incidence and survival data from SEER to examine reasons for the
significant decline in colorectal cancer mortality rates for both Caucasian males
and females that began in 1985.[16] Results of this study demonstrated the important role of screening to detect early-stage
cancer for reducing mortality.
Racial Disparities
A more recent study used the SEER database to calculate the age-specific incidence
in colorectal cancer in African Americans as compared with Caucasians while controlling
for differences in socioeconomic status to evaluate the disagreement regarding the
age at which to initiate screening in African Americans.[17] Based on the results of this study, a disparity was seen in the age-specific incidence
of colorectal cancer in African Americans as compared with Caucasians beginning at
45 years of age. Findings from this study may help policymakers (e.g., the U.S. Preventative
Services Task Force) decide how to focus their efforts on improving screening rates
for colorectal cancer and which specific populations should be targeted.
The SEER database has been used to investigate racial disparities in colorectal cancer
for several decades. A study by Robbins et al used SEER data from 1985 to 2008 to
evaluate stage-specific colorectal cancer mortality rates by race.[18] Several subsequent studies have used the SEER registry to further evaluate this
disparity.[19]
[20]
[21]
Young-Onset Colorectal Cancer
Survival analyses of colorectal cancer patients in the SEER registry demonstrate that
young patients with colorectal cancer have a higher cancer-specific survival rate
following resection as shown in [Fig. 1], although they present with more unfavorable tumor biology and a greater proportion
present at an advanced stage.[22] The database has also been used to evaluate gender disparities in metastatic colorectal
cancer survival.[23] A study by Hendifar et al revealed that younger women with metastatic colorectal
cancer exhibit a survival advantage as compared with their male counterparts, suggesting
that hormonal status may be of prognostic significance.[23]
Fig. 1 Survival curves in colorectal cancer patients according to age status.[22]
Rectal Cancer
A study by Lee et al used the database to compare differences in stage-specific survival
between colon and rectal cancer patients.[24] The researchers demonstrated that colon cancer patients had better survival than
those with rectal cancer, by a margin of 4 months in stage IIB ([Fig. 2a], [b]). However, in stage IIIC and stage IV, rectal cancer patients had better survival
than colon cancer patients by approximately 3 months.
Fig. 2 Survival and cumulative hazard of stage IV colon and rectal cancer patients (1, colon
cancer; 2, rectal cancer).
The SEER database has also been used to investigate the impact of rural versus urban
setting on the stage at presentation of colorectal cancer.[25] In this retrospective analysis, investigators concluded that residence in an urban
setting as compared with a rural environment was associated with later stages colorectal
cancer at presentation.
Analysis of the SEER database has demonstrated that the incidence of rectal cancer
is increasing in patients younger than 40 years.[26] Using the detailed histology data recorded in SEER, researchers were able to determine
that individuals in this population were 3.6 times more likely to have signet cell
histology.[27] Current staging guidelines for rectal cancer have been reviewed using information
from the SEER database. Gunderson et al helped to validate changes in AJCC staging
for rectal cancer by supporting the shift of T1–2N2 lesions from IIIC to IIIA or IIIB
and T4bN1 from IIIB to IIIC.[28] This study also supported subdividing T4, N1, and N2 and revised the substaging
of stages II and III rectal cancer.
SEER-Medicare
Use of the linked SEER-Medicare database has allowed for a variety of analyses that
span the course of colorectal cancer ranging from screening and detection to terminal
care and mortality. Much of the research using this database is focused on health
disparities, quality of care, and cost of treatment.[12]
[29] Investigators have used the SEER-Medicare database to assess the impact of surgeon
and hospital procedure volume on rectal cancer outcomes.[30] Results from this study concluded that surgeon-specific volume was associated with
2-year mortality and remained an important predictor of rectal cancer outcome even
after adjustment for hospital volume.
The linked database has also been used to estimate the relative impact of changes
in demographics, stage at detection, treatment mix, and medical technology on 5-year
survival among older colorectal cancer patients.[31] The linked database allows estimates of cancer-related medical costs by site, stage
of disease, treatment approach, and gender. In a study by Brown et al, data on Medicare
payments were obtained for colorectal cancer patients during 1990–1994 from the SEER-Medicare
database.[32] This study demonstrated that valid estimates of cancer-related long-term cost can
be obtained from administrative claims data linked to incidence cancer registry data.
Patterns of therapy regimens and their efficacy have also been analyzed using the
linked SEER-Medicare database.[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42] Haynes et al found that although neoadjuvant chemoradiation followed by tumor resection
and postoperative chemotherapy is the standard of care for patients with clinical
stage II or III adenocarcinoma of the rectum, significant variation exists in the
receipt of postoperative chemotherapy after resection in the elderly population with
more than one in three patients failing to receive adjuvant therapy.[43]
The abovementioned studies are just a selection of the numerous publications produced
using the SEER and SEER-Medicare databases for the study of colorectal cancer.
Comparison of SEER to Other National Databases
Comparison of SEER to Other National Databases
Increasingly clinical research is being performed using national and local databases
in the study of colorectal cancer. Other national registries that are comparable to
SEER in terms of size and impact include the National Cancer Database (NCDB) created
by the American College of Surgeons and the American Cancer Society, the National
(Nationwide) Inpatient Sample (NIS), and the University HealthSystem Consortium (UHC)
databases. While clinical databases (NCDB, SEER) tend to be more focused on oncologic
disease incidence, treatment, and patient outcomes, the administrative databases (NIS,
UHC) include data that focus on cost and hospital/provider characteristics. Administrative
databases were not originally designed for clinical research, but instead to track
billing for hospitals, providers, and procedures.[4] Administrative data are typically derived from two sources: requests to insurers
for healthcare payments and claims for clinical services and therapies. In contrast,
clinical databases were developed with specific clinical goals. The geographic catchment
areas of the databases also vary. They may be national, state-based, or limited to
specific locations. A comparison of the data available in each of these databases
is depicted in [Table 3].
Table 3
Comparison of national clinical and administrative databases used in colorectal cancer
research
|
SEER
|
NCDB
|
NIS
|
UHC
|
Type of data
|
Clinical
|
Clinical
|
Administrative
|
Administrative
|
Patient population
|
30% of U.S. population, 17 population-based cancer registries
|
70% of all U.S. cancer cases, COC-approved hospitals only
|
20% sampling of all hospital admissions
|
90% patients at nonprofit, academic medical centers
|
Cancer staging data
|
Yes
|
Yes
|
No
|
No
|
Cancer treatment
|
|
|
|
|
Surgery
|
Yes
|
Yes
|
Yes
|
Yes
|
Chemotherapy
|
No
|
Yes
|
No
|
Yes
|
Radiation therapy
|
Yes
|
Yes
|
No
|
Yes
|
30-d outcomes
|
No
|
Yes
|
No
|
Yes
|
5-y mortality
|
Yes
|
Yes
|
No
|
No
|
Surgeon-specific data
|
No
|
No
|
Yes
|
Yes
|
Availability
|
Publicly available
|
American College of Surgeon members
|
Available for fee
|
UHC member institutions
|
Linkable to other databases
Web site
|
Yes
http://seer.cancer.gov/
|
No
http://www.facs.org/cancer/ncdb/
|
Yes
https://www.hcup-us.ahrq.gov/db/nation/nis/nisdbdocumentation.jsp
|
Yes
https://www.vizientinc.com/Login.htm
|
Abbreviations: NCDB, National Cancer Database; NIS, National (Nationwide) Inpatient
Sample; SEER, Surveillance, Epidemiology and End Results Program; UHC, University
Health System Consortium.
Future Work on Colorectal Cancer Using SEER
Future Work on Colorectal Cancer Using SEER
As more investigators are utilizing SEER and SEER-Medicare databases for outcomes
research, there are ways these registries could be more effectively applied to further
our understanding of colorectal cancer and improve patient care. Future studies targeted
at improved staging and treatment algorithms will allow personalized therapy in the
treatment of colorectal cancer such as watchful waiting in rectal cancer. Disparities
in care received in different geographical regions and in different patient subsets
need to be better identified and understood to promote national efforts for improvements
in the quality of care delivered to patients. A greater emphasis on primary prevention
and early detection is crucial to counter the effects of our aging and expanding population.[44]
Summary and Conclusion
SEER and SEER-Medicare are valuable databases used to understand the natural history
of colorectal cancer and to evaluate the effectiveness of therapies. Data collected
in these registries have served to establish and validate staging strategies, evaluate
regional treatment variation, and identify disparities in care. Appropriate study
design and thoughtful analyses allow investigators to make novel discoveries and answer
key clinical questions in oncologic care. Understanding the strengths and limitations
of these large databases is essential to perform quality surgical outcomes research.