Authenticating Human Cell Lines STR Kits Capillary Electrophoresis Application Note
Authenticating Human Cell Lines STR Kits Capillary Electrophoresis Application Note
Authenticating Human Cell Lines STR Kits Capillary Electrophoresis Application Note
In this application note, we show: Establishing the authenticity or provenance of a cell line
• The identity of cell lines grown in vitro can be often involves analysis of variants or alleles of several
verified using the Applied Biosystems™ Cell loci and making sure these variants match expected
Line Authentication (CLA) IdentiFiler™ Plus and alleles. There are several methods for analyzing variant
Direct kits, and CLA GlobalFiler™ PCR Amplification Kit loci, ranging from electrophoretic analysis of isozyme
variants through analysis of restriction fragment length
• Cell lines can be authenticated using as little as 100 pg of polymorphisms (RFLPs) and amplified fragment length
purified genomic DNA (gDNA) polymorphisms (AFLPs), to next-generation sequencing
• Cell lines can be authenticated directly from punches (NGS) and analysis with MALDI-TOF mass spectrometry.
of cells spotted onto NUCLEIC-CARD™ sample However, many of these methods suffer from at least
collection devices one drawback, ranging from insufficient complexity to
high costs. Nevertheless, analysis of highly variable short
• The Applied Biosystems™ SeqStudio™ Genetic Analyzer tandem repeat (STR) markers, a well-established technique
gives high performance with low concentrations of gDNA commonly used in DNA forensic analysis, can provide a
and cells simple, inexpensive, and highly specific genetic “fingerprint”
of a cell line. Comparing a profile of alleles present at these
Introduction highly variant loci to known, standardized samples of a
The study of human diseases relies heavily on the analysis cell line provides confidence that the cell line is authentic.
of dissociated human cell lines grown in culture. However, Organizations such as ATCC and Leibniz-Institute
an increasingly acknowledged problem is that cells grown DSMZ—German Collection of Microorganisms and Cell
in vitro can be misidentified or become contaminated with Cultures provide online access to searchable databases
other, unrelated cell lines [1]. Misidentification of cell lines that allow investigators to query known cell types.
produces misleading results, confusion, and added costs Alternatively, researchers can establish an allelic profile of
to research [2-4]. Journals and funding agencies now cell lines unique to their lab, and over time compare the
require researchers to ascertain that the cell lines they use allelic profiles of the cells to their own internal standards to
are authentic, and to identify strategies for ensuring they ensure that the cell identities are true.
remain so over the course of a study (for examples, see Yu
et al., 2015 [5] and Neimark, 2015 [6]).
Typically, analysis of STRs is performed by capillary software solutions facilitate analysis of STRs by making
electrophoresis (CE) of fragments amplified from use of pre-established allelic ladders and sizing bin sets
microsatellite loci with varying number of repeats. We offer for the various STR alleles covered by the IdentiFiler
instruments that are optimized for researchers’ needs kits. An illustration of the complete workflows for cell line
in sensitivity and throughput. Furthermore, the Applied authentication is shown in Figure 1.
Biosystems™ product portfolio has several different kits for
PCR-based STR fingerprinting for use on CE instruments. The latest member of our CE instrument family—the
The CLA IdentiFiler Plus PCR Amplification Kit has been SeqStudio Genetic Analyzer—has several new features
optimized to analyze 16 highly variant human STRs over aimed at making CE analysis easier. The SeqStudio
a wide range of purified gDNA preparations. The CLA Genetic Analyzer has a completely redesigned user
IdentiFiler Direct PCR Amplification Kit was first developed interface driven by an on-instrument touchscreen and
to analyze the same 16 STR loci, starting from dried blood integrated computer, facilitating setup and runs. Embedded
or buccal spots (for example, on NUCLEIC-CARD devices) run modules and a removable slide-in cartridge allow
or buccal swabs. For the NUCLEIC-CARD device, a the SeqStudio Genetic Analyzer to be used for either
1.2 mm punch from the card is placed directly into a PCR Sanger sequencing or fragment analysis without any
tube or well, and amplified without any further purification. reconfiguration. This genetic analyzer provides maximum
When extra levels of discrimination are needed, the CLA flexibility and ease of use, making it an ideal platform for
GlobalFiler PCR Amplification Kit allows 6-dye analysis of many applications, such as cell line authentication when
24 loci, 16 of which are included in the IdentiFiler kits. coupled with the CLA IdentiFiler and GlobalFiler kits.
A
Spot cells onto Amplify using CLA Fragment Analyze using
NUCLEIC-CARD IdentiFiler Direct kit analysis by CE GeneMapper
device Software 6
B
Purify Amplify using Fragment Analyze using
gDNA CLA IdentiFiler Plus or analysis by CE GeneMapper
CLA Globalfiler kit Software 6
Figure 1. Workflows for cell line authentication. Two methods are available for cell line authentication. (A) Cells can be spotted onto NUCLEIC-CARD
devices, punches of the cards amplified directly using the CLA IdentiFiler Direct kit, and fragments analyzed on Applied Biosystems™ CE instruments
using GeneMapper Software 6 or the MSA cloud application. (B) Alternatively, gDNA can be purified from cell lines, amplified using the CLA IdentiFiler
Plus or CLA GlobalFiler kit, and fragments analyzed by capillary electrophoresis and GeneMapper Software 6 or the MSA cloud application.
2
Results
Use of CLA IdentiFiler Plus or GlobalFiler kit,
and gDNA
GeneMapper Software 6 simplifies the calling of STR peak being present (red circle, Figure 2, bottom). Thus, at
alleles by aligning unknown fragments with a ladder of this locus, the cells are heterozygous for these two alleles.
STR fragments of known allele sizes (Figure 2, top). For At the D19S433 locus, a single peak is present at allele 14,
example, the sample containing gDNA from M4A4GFP indicating that these cells are homozygous for this allele.
cells has two peaks from the sample at the D7S820 locus.
GeneMapper Software 6 aligns these two peaks to alleles The combination of all the alleles gives a unique fingerprint
8 and 10 in the allelic ladder, with sizes of approximately to this cell culture, and can be compared to other cultures
263 and 271 bp, respectively, without any other significant or known samples to establish authenticity.
Figure 2. GeneMapper Software 6 analysis that compares the Applied Biosystems™ IdentiFiler™ Plus Allelic Ladder (top) and purified gDNA
from M4A4GFP cells grown in culture (bottom). GeneMapper Software 6 uses the allelic ladder provided in the IdentiFiler kits to assign the alleles
present in an unknown sample to known STR alleles. The boxes below the peaks show the allele number, the height of the peak, and the size of the
fragment in base pairs.
3
To show the utility of the CLA IdentiFiler Plus kit and was determined by comparing against the alleles present
SeqStudio instrument for cell line authentication, we in the ATCC database (see later section, Verifying cell
analyzed DNA from cultures of five different commonly authenticity against known standards) for these five cell
used cell lines. Briefly, gDNA was purified from cell lines. On the Applied Biosystems 3130xl instrument,
pellets using the Invitrogen™ RecoverAll™ Total Nucleic 27–29 cycles generated the highest percentage of accurate
Acid Isolation Kit for formalin-fixed, paraffin-embedded calls, whereas on the Applied Biosystems™ 3500xL
(FFPE) samples. However, since deparaffinization was and SeqStudio instruments, 27–28 cycles produced
not necessary for this sample type, we skipped those highest-confidence data. Using these results as a guide,
steps in the protocol. The DNA amount was determined we recommend determining the optimal number of cycles
using the Applied Biosystems™ Quantifiler™ Human DNA for the CE system that will be used in your lab.
Quantification Kit. One nanogram of purified DNA in
10 µL water, or 3-fold serial dilutions of gDNA starting at To determine the minimal amount of DNA that can be used
3 ng/µL in water, were prepared and analyzed with the CLA with the CLA IdentiFiler Plus kit, we also performed serial
IdentiFiler Plus kit according to the protocol found in the dilutions of purified M4A4GFP gDNA (data not shown). The
User Guide (Pub. No. 4440211, Revision F). Following PCR, most accurate results were obtained when 0.3–3 ng of
1 µL of the reaction was denatured in Applied Biosystems™ gDNA was used in the amplification reaction. By using the
Hi-Di™ Formamide, and loaded onto the instrument for higher number of PCR cycles, we were able to accurately
capillary electrophoresis. Fragment peaks were analyzed profile small amounts of DNA; however, at higher DNA
in GeneMapper Software 6 using an imported human concentrations, more PCR cycles also resulted in higher
identification (HID) analysis method (see Appendix). numbers of spurious peaks and reduced accuracy.
Therefore, for highest confidence in the allelic calls, we
Initial experiments were performed to determine the recommend using 1 ng of purified gDNA in a cell line
number of PCR cycles needed for optimal allele detection authenticity analysis.
on the different instrument platforms (Figure 3). Accuracy
100%
90%
80%
SeqStudio Genetic Analyzer
70%
3500xL Genetic Analyzer
Correct calls
50%
40%
30%
20%
10%
0%
26 cycles 27 cycles 28 cycles 29 cycles
Figure 3. Analysis of gDNA using the CLA IdentiFiler Plus kit. One nanogram of DNA from five different cell lines was analyzed using varying numbers
of PCR cycles. Overall, the most accurate results were obtained when PCR was performed for 27–29 cycles.
4
Use of CLA IdentiFiler Direct kit and
NUCLEIC-CARD device
An alternative method for authenticating cell cultures makes The minimum amount of cell suspension that can be
use of the NUCLEIC-CARD device. The matrix in these analyzed was determined by performing serial dilutions of
cards is chemically treated to enable cell lysis and protein the starting cell suspension into PBS (Figure 4). Complete
denaturation so that the DNA on the card is immobilized and accurate profiles were obtained from the undiluted
and preserved for long-term storage at room temperature. samples across all cell lines. Diluting the suspension
To demonstrate the performance of these cards, we 10-fold before spotting onto the NUCLEIC-CARD device
prepared suspensions of several different human cell resulted in dropout of some alleles. The number of allele
lines in PBS (approximately 5 x 105 cells/mL; see Figure 4 dropouts inversely correlated with the concentration of
for exact concentrations). One hundred microliters of the suspensions—those with slightly higher concentration
suspension were spotted directly onto the NUCLEIC-CARD had fewer dropouts, while lower concentrations had
device and dried overnight. Single 1.2 mm punches were more dropouts. At a 100-fold dilution, an average of
taken from the area with the dried suspension and placed about 50% of the alleles were detectable. For highest
into a well of a 96-well PCR plate. Controls and reagents confidence in allele calls, we therefore recommend
from the CLA IdentiFiler Direct kit were added to the plate spotting a suspension of around 5 x 105 cells/mL onto the
according to the protocol supplied in the user guide (Pub. NUCLEIC-CARD device.
No. 4415125, Revision J), and amplified for 29 cycles. As
described above, following PCR amplification, 1 µL of
product was denatured in Hi-Di Formamide and analyzed
Cell line Starting concentration (cells/mL)
on a SeqStudio, 3500xL, or 3130xl Genetic Analyzer using
GeneMapper Software 6. A549 2.6 x 105
M4A4GFP 4.7 x 105
U2OS 7.2 x 105
HeLa 2.1 x 105
HEK293 5.0 x 105
100%
60%
40%
20%
0%
Undiluted 10-fold 100-fold
Figure 4. Titration of cell suspensions dried onto NUCLEIC-CARD device. Cell suspensions were prepared at the concentrations shown in the table,
then serial dilutions of those suspensions were analyzed using NUCLEIC-CARD devices and the CLA IdentiFiler Direct kit. The average percentages of
correct allele calls across all cell lines at the indicated dilutions are shown.
5
Verifying cell authenticity against known standards organizations have simplified the cell line authentication
Several research organizations have recognized the process by making it possible to compare alleles present
need for matching unknown samples against known cell in an unknown sample to those of known, commonly
lines to establish authenticity. For example, ATCC has used cell lines. The International Cell Line Authentication
set up a web-based query against their database of cell Committee has set guidelines for interpretation: cells with
line STRs (Figure 5, atcc.org/STR_Database.aspx). 80–100% allelic matching come from the same donor,
To use it, simply enter the alleles from the sample and and a <50% match generally means that cells come from
choose the stringency of the query. A list of cell lines in different donors or have different origins [7].
the ATCC database that match the alleles present in the
sample will be returned. Similarly, the Leibniz-Institute Identification of contaminating cells in a culture
DMSZ has a web-based STR query system in place One of the objectives of a cell line authentication solution
(dsmz.de/fp/cgi-bin/str.html). Note that both of these is to determine whether a cell line is contaminated
pages query only 9 loci (8 autosomal and 1 sex-linked), with unrelated cells. This is easily achievable, since the
and therefore not all of the loci in the CLA IdentiFiler or contaminating cells are likely to have a different STR profile
CLA GlobalFiler kits are used for this external database than the parent line. In a mixture of cell lines, the final profile
comparison. However, the additional loci are enormously will reflect the combination of all cells present. For example,
useful for comparisons against locally generated controls a single peak at a locus could represent both cell types
or when establishing the provenance of a new cell line. being homozygous for the same allele, or one cell type
The identity of each of the cell lines described in this being homozygous for the allele and the other homozygous
application note was verified using both databases. These for a deletion that removes the locus. Two peaks could
mean both cell types are homozygous for different loci,
or heterozygous for the same loci, etc. Although the
interpretation of aberrant peaks at a single locus could be
challenging and ambiguous, analysis of 16 different loci
will likely identify distinct peaks that clearly point to the
presence of a contaminating cell line, even if its genomic
makeup might not be fully discernible.
6
An equal mixture of M4A4GFP and HeLa cells produces a the SeqStudio and 3500xL platforms—even at 10% of the
profile in which extra peaks are clearly present (Figure 6A). cell mixture, about 65% of the HeLa-specific alleles were
For example, at the locus D7S820, three alleles were still detectable. A similar analysis was performed using
positively located, whereas two alleles, at most, would lower percentages of gDNA from HeLa cells mixed with
be expected in a homogeneous diploid cell population. M4A4GFP DNA (Figure 6C). Over half of the HeLa-specific
Several other loci also have three alleles present. Although alleles could be detected in mixtures containing as little as
conclusions can’t be drawn solely from the one- and 4% HeLa cells, demonstrating the analytical sensitivity of
two-peak loci, the presence of three alleles at multiple the CLA IdentiFiler Plus kit.
loci definitely points to a heterogeneous cell population or
a DNA mixture. On all platforms, we could clearly detect Contaminating cells can be detected with the highest
heterogeneity in the mixture containing 20% HeLa cells confidence if they represent >20% of the total population.
(Figure 6B). As the percentage of HeLa cells dropped in the However, indicator alleles may be detectable if the
mixture, the number of HeLa-specific alleles detected also population contains as little as 4% of a contaminating
decreased. However, the decrease was less marked on cell line.
B C
Unique HeLa allele detected
80%
Unique HeLa allele detected
60% 40%
30%
40%
20%
20%
10%
0% 0%
10% 15% 20% 25% 30% 50% 1% 2% 3% 4% 5% 10%
HeLa cells in population HeLa gDNA
Figure 6. Detection of contaminating cells in a culture. (A) Extraneous peaks in a culture may be an indication of genomic heterogeneity, or the
presence of contaminating cells. In this case, a 1:1 mixed suspension of M4A4GFP and HeLa cells was analyzed using the NUCLEIC-CARD device
and CLA IdentiFiler Direct kit. The profile of a contaminated culture can vary, depending on the allelic makeup of the host and contaminating cells. (B)
Analysis of mixed samples on the NUCLEIC-CARD device. HeLa cells and M4A4GFP cell suspensions were diluted to 5 x 105 cells/mL each, mixed in
the indicated proportions, and spotted onto the NUCLEIC-CARD device. Identical aliquots were analyzed on the indicated instrument types. Note that
the 30% population was not analyzed on the 3500xL platform. (C) Analysis of gDNA from mixed samples. gDNA from HeLa cells was mixed with gDNA
from M4A4GFP cells at the indicated proportions. More than half of the alleles unique to HeLa cells can be detected in a mixture containing at least
4% HeLa gDNA using the 3500xL instrument.
7
Allelic imbalance in cell lines
In STR analyses, normal genotypes present themselves
as either a single peak (homozygous) or two balanced
peaks with equal height (heterozygous). However, cell
lines passaged in vitro demonstrate genomic instability,
and the peak heights of sister alleles can differ due to
duplication or deletion of loci, portions of chromosomes,
or entire chromosomes. For example, the HEK293 cells
used in this study are balanced at the D8S1179 locus, but
show significantly different peak heights at the CSF1P0 Figure 7. Allelic imbalance in cell lines. In normal cells, the peak height
and D13S317 loci (Figure 7). Moreover, when the DNA of heterozygotes is very similar; in cell lines, peak heights can often differ
at a locus (red circles). This is possibly due to genomic instability of the
quantity added to the amplification reaction is 100 pg cell line, leading to duplication and deletions of loci and therefore different
or less, stochastic effects may introduce imbalances in abundances of alleles.
peak height.
8
0.5
1.5
2.5
3.5
4.5
3
3
3
0
0.5
1
1.5
2
2.5
3.5
4
4.5
5
0
0.5
1
1.5
2
2.5
3.5
4
4.5
5
0
0.5
1
1.5
2
2.5
3.5
4
4.5
5
0
1
2
3
4
5
NA17239 NA17239 NA17239 NA17239
NA17225 NA17225 NA17225 NA17225
NA17215 NA17215 NA17215 NA17215
NA17214 NA17214 NA17214 NA17214
NA17213 NA17213 NA17213 NA17213
NA17207 NA17207 NA17207 NA17207
NA17205 NA17205 NA17205 NA17205
NA17123 NA17123 NA17123 NA17123
FGA
AMEL
D18S51
KitControl+ KitControl+ KitControl+ KitControl+
D3S1358
U2-OS U2-OS U2-OS U2-OS
M4A4GFP M4A4GFP M4A4GFP M4A4GFP
HeLa HeLa HeLa HeLa
A549 A549 A549 A549
HEK293 HEK293 HEK293 HEK293
TH01
NA17123
D5S818
NA17123 NA17123 NA17123
CSF1PO
D19S433
KitControl+ KitControl+ KitControl+ KitControl+
U2-OS U2-OS U2-OS U2-OS
M4A4GFP M4A4GFP M4A4GFP M4A4GFP
HeLa HeLa HeLa HeLa
A549 A549 A549 A549
HEK293 HEK293 HEK293 HEK293
TPOX
D21S11
D7S820
NA17123 NA17123 NA17123 NA17123
D13S317
vWA
NA17123 NA17123 NA17123 NA17123
D8S1179
D2S1338
D16S539
cell lines (right of the green line) were analyzed with the CLA IdentiFiler Plus kit. Allele ratios were calculated using the heights of peaks in heterozygotes,
or defined as 1.0 in homozygotes (red line). Note that most of the samples from normal human donors have allele ratios close to 1.0, whereas the cell line
Figure 8. Cell lines demonstrate measurable allelic imbalance. Eight standardized gDNA samples from human donors (left of the green line) and five
9
Appendix
GeneMapper modules and settings needed To install the “Panel File”:
for analysis – Single-click on “Panel Manager” in the left-hand
Instructions below are given for importing into GeneMapper window pane.
Software 6. For use with the cloud-based MSA software,
– Select “File” and “Import Panels” from the menu bar.
please contact your local field applications scientist.
– A dialog box will appear. Navigate to the location of
Before analyzing the FSA files for a cell line authentication the “Panel” file on your computer. If you placed it in the
project, the appropriate BIN files must be imported into default “Panels” folder, the dialog box should open to
GeneMapper Software 6. To import the files needed, follow the correct folder.
the instructions below.
• AmpFℓSTR_Bins_v2.txt and AmpFℓSTR_Panels_v2.txt – Select the file titled AmpFℓSTR_Panels_v2.txt
can be downloaded here: thermofisher.com/us/en/
– This will install the “Panel” in the left window pane.
home/technical-resources/software-downloads/
genemapper-id-software.html
3500xL POP-7 ™
1.6 15 19.5 1,330 50 cm FragmentAnalysis50_POP7xl
10
To install the “Bin File”: • Click on “Apply”, followed by “OK” to return to
– Single-click on “AmpFℓSTR_Panels_v2” in the left-hand GeneMapper Software 6. The IdentiFiler_v2 panel and
window pane. associated “Bin Set” can be chosen under “Panel” when
performing analysis.
– Select “File” and “Import Bin Set” from the menu bar.
• To set up a cell line authentication analysis method,
– A dialog box will appear. Navigate to the location of
either modify an existing method or create a new one
the “Bin” file on your computer. If you placed it in the
using the values in Figure 9A as a guide.
default “Panels” folder, the dialog box should open to
the correct folder. • To set up a plot setting to visualize the alleles detected,
create a new setting and enter the parameters shown in
– Select the file titled AmpFℓSTR_Bins_v2.txt
Figure 9B.
– This will install the “Bins” in the pull-down menu at the
• To view more than two alleles in the genotyping tables,
top titled “Bin Set”.
change the “Allele Settings” in the “Genotypes” tab of the
“Table Setting Editor” (Figure 9C).
Figure 9. GeneMapper Software 6 settings for (A) cell line authentication analysis method, (B) cell line authentication plot settings, and
(C) viewing more than two alleles in the genotyping tables. These screens can be accessed by opening the “Manager Tool” and choosing
the appropriate tab.
11
Ordering information
Product Quantity Cat. No.
SeqStudio Genetic Analyzer System with SmartStart orientation (Includes SeqStudio
Genetic Analyzer, SeqStudio Genetic Analysis Software, SmartStart 1-day training, A35644
1-year warranty)
SeqStudio Genetic Analyzer System with SmartStart orientation plus 1-year extended
A35645
warranty (Includes all items from A35644 plus additional 1-year warranty)
SeqStudio Genetic Analyzer System with SmartStart orientation plus 3-year extended
A35646
warranty (Includes all items from A35644 plus additional 3-year warranty)
SeqStudio Cartridge v2 1,000 reactions A41331
3500 Genetic Analyzer 1 system 4440466
3500xL Genetic Analyzer 1 system 4440467
CLA IdentiFiler Direct PCR Amplification Kit 200 reactions A44661
NUCLEIC-CARD COLOR matrix, 4 spots 50 cards 4473978
NUCLEIC-CARD matrix, 1 spot 100 cards 4473973
Prep-n-Go Buffer (for use with buccal swab substrate) 200 reactions 4471406
Prep-n-Go Buffer (for use with untreated paper substrates) 1,000 reactions 4467079
CLA IdentiFiler Plus PCR Amplification Kit 50 reactions A47624
CLA IdentiFiler Plus PCR Amplification Kit 200 reactions A44660
CLA GlobalFiler PCR Amplification Kit 200 reactions A44662
RecoverAll Total Nucleic Acid Isolation Kit for FFPE 40 reactions AM1975
GeneMapper Software 6 1 license A38892
GeneScan 600 LIZ v2.0 800 reactions 4408399
References
1. Lorsch JR et al. (2016) Fixing problems with cell lines. Science 6216:1452–1453.
2. Huang Y et al. (2017) Investigation of cross-contamination and misidentification of 278 widely used tumor cell lines.
PLoS One 12(1):e0170384. doi: 10.1371/journal.pone.0170384.
3. He Y et al. (2016) Retracted: Knockdown of tumor protein D52-like 2 induces cell growth inhibition and apoptosis in oral
squamous cell carcinoma. Cell Biol Int 40:361.
4. https://grants.nih.gov/grants/guide/notice-files/NOT-OD-08-017.html
5. Yu M et al. (2015) A resource for cell line authentication, annotation and quality control. Nature 520:307–311.
6. Neimark J (2015) Line of attack. Science 347:938–940.
7. http://standards.atcc.org/kwspub/home/the_international_cell_line_authentication_committee-iclac_/
Authentication_SOP.pdf