ERC
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Endocrine-Related Cancer 14 (3) 809-826    DOI: 10.1677/ERC-06-0048
Copyright © 2007 by the Society for Endocrinology.
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (2)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Fujarewicz, K.
Right arrow Articles by Swierniak, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Fujarewicz, K.
Right arrow Articles by Swierniak, A.

A multi-gene approach to differentiate papillary thyroid carcinoma from benign lesions: gene selection using support vector machines with bootstrapping

Krzysztof Fujarewicz1, Michal Jarzab3,4, Markus Eszlinger5, Knut Krohn5,6, Ralf Paschke5, Malgorzata Oczko-Wojciechowska2, Malgorzata Wiench2, Aleksandra Kukulska2, Barbara Jarzab2 and Andrzej Swierniak1,2

1 Systems Engineering Group, Institute of Automatic Control, Silesian University of Technology, 44-100 Gliwice, Poland
2 Department of Nuclear Medicine and Endocrine Oncology, Institute of Oncology, Maria Sklodowska-Curie Memorial Cancer Center, Gliwice Branch, Wybrzeze Armii Krajowej 15, 44-100 Gliwice, Poland
3 Departments of Tumor Biology and
4 Clinical Oncology, Institute of Oncology, Maria Sklodowska-Curie Memorial Cancer Center, Gliwice Branch, 44-100 Gliwice, Poland
5 III. Medical Department and
6 Interdisciplinary Center of Clinical Research Leipzig, University of Leipzig, 04103 Leipzig, Germany

(Requests for offprints should be addressed to B Jarzab; Email: bjarzab{at}io.gliwice.pl)


    Abstract
 Top
 Abstract
 Introduction
 Material and methods
 Results
 Discussion
 References
 
Selection of novel molecular markers is an important goal of cancer genomics studies. The aim of our analysis was to apply the multivariate bioinformatical tools to rank the genes – potential markers of papillary thyroid cancer (PTC) according to their diagnostic usefulness. We also assessed the accuracy of benign/malignant classification, based on gene expression profiling, for PTC. We analyzed a 180-array dataset (90 HG-U95A and 90 HG-U133A oligonucleotide arrays), which included a collection of 57 PTCs, 61 benign thyroid tumors, and 62 apparently normal tissues. Gene selection was carried out by the support vector machines method with bootstrapping, which allowed us 1) ranking the genes that were most important for classification quality and appeared most frequently in the classifiers (bootstrap-based feature ranking, BBFR); 2) ranking the samples, and thus detecting cases that were most difficult to classify (bootstrap-based outlier detection). The accuracy of PTC diagnosis was 98.5% for a 20-gene classifier, its 95% confidence interval (CI) was 95.9–100%, with the lower limit of CI exceeding 95% already for five genes. Only 5 of 180 samples (2.8%) were misclassified in more than 10% of bootstrap iterations. We specified 43 genes which are most suitable as molecular markers of PTC, among them some well-known PTC markers (MET, fibronectin 1, dipeptidylpeptidase 4, or adenosine A1 receptor) and potential new ones (UDP-galactose-4-epimerase, cadherin 16, gap junction protein 3, sushi, nidogen, and EGF-like domains 1, inhibitor of DNA binding 3, RUNX1, leiomodin 1, F-box protein 9, and tripartite motif-containing 58). The highest ranking gene, metallophosphoesterase domain-containing protein 2, achieved 96.7% of the maximum BBFR score.


    Introduction
 Top
 Abstract
 Introduction
 Material and methods
 Results
 Discussion
 References
 
Discrimination between benign thyroid nodules and cancer is an important aspect of determining the optimal extent of thyroid surgery. Currently, this is achieved by routine morphologic assessment of cytopathology samples. However, this method does not allow proper classification of all thyroid tumors (Baloch & Livolsi 2002, Franc et al. 2003). At several institutions, genomic studies have been undertaken which besides focusing on basic biological issues (Huang et al. 2001, Giordano et al. 2005), also explore potential diagnostic applications (Aldred et al. 2004, Chevillard et al. 2004, Finley et al. 2004a,b). Our recent microarray-based analysis brought a 20-gene classifier to differentiate between papillary thyroid cancer (PTC) and normal thyroid tissue (Jarzab et al. 2005), further verified using three independent datasets (Eszlinger et al. 2006). Very large and easily distinguishable differences between the molecular profiles of PTC and normal thyroid have clearly demonstrated the applicability of gene expression findings to diagnostic purposes. However, even more desirable for the clinician would be genomic profiling-based capability to discriminate between malignant tumors and various benign lesions. Therefore, we decided to use a balanced mixture of samples from malignant and benign tumors and normal thyroid tissue to mimic the clinical situation, where the material from any of these may be obtained and shall be properly classified. This large 180-array dataset is derived respectively from de novo studies (n = 40), previously published own microarray data (n = 124; Eszlinger et al. 2001, 2004, Jarzab et al. 2005), and accessible datasets published by other authors (n = 16; Huang et al. 2001).

We set the following goals for the study:

  1. To assess accuracy of benign/malignant classification of thyroid specimens in relation to gene set size, in the context of PTC and
  2. To optimize the list of diagnostically relevant genes in PTC.

To answer both questions, we used the support vector machines (SVMs) method with bootstrapping. This approach relies on iterative construction of SVM classifiers based on randomly selected sets of specimens (bootstrap samples) and testing the classifiers on remaining samples. We applied bootstrap to obtain both gene (feature) ranking and outlier detection. The ranking of the genes that are most important for classification quality was based on the frequency of their occurrence in the classifiers of different size (bootstrap-based feature ranking, BBFR). The ranking of the misclassified samples allowed to detect outliers (bootstrap-based outlier detection, BBOD) and to obtain a reliable estimate of classification accuracy with appropriate confidence intervals (CI) for gene sets of different size.


    Material and methods
 Top
 Abstract
 Introduction
 Material and methods
 Results
 Discussion
 References
 
Microarray data used in the study

Microarray datasets from three sources were included in the analysis:

  1. Dataset obtained in Gliwice, Poland; in total, 90 specimens analyzed with GeneChip HG-U133A microarrays. The specimens were collected from 71 patients with PTC (9 males and 40 females; mean age 36 years, range 6–71 years) and 22 with other thyroid diseases, 6 with follicular adenoma, 13 with nodular or colloid goiter and 3 with chronic thyroiditis (9 males and 13 females; mean age 45 years, range 11–71 years). The thyroid tissue specimens included 49 PTC tumors and 41 normal/benign thyroid tissue samples. The latter samples were from patients with PTC (n = 17) or other benign thyroid lesions (n = 24), among them six follicular adenomas, four nodular goiters, nine colloid goiters, and five cases of thyroiditis, two of them taken from the contralateral lobe from patients with PTC. Fifty microarrays were included in our previously published study and publicly available at www.genomika.pl/thyroidcancer (Jarzab et al. 2005); 40 microarrays were from de novo studies. All new samples were processed according to description given in Jarzab et al.(2005).
  2. Dataset obtained in Leipzig, Germany; 74 specimens analyzed with GeneChip HG-U95Av2 microarrays. The specimens included 15 autonomously functioning thyroid nodules, 22 cold thyroid nodules, and 37 samples of their respective surrounding thyroid tissues. The analysis of these datasets was published previously (Eszlinger et al. 2001, 2004) and the datasets are available at http://www.uni-leipzig.de/innere/_forschung/schwer-punkte/etiology.html.
  3. Dataset obtained in Columbus, OH, USA; 16 specimens analyzed with GeneChip HG-U95A microarrays. The specimens were derived from eight patients and included both PTC tumors and their surrounding thyroid tissues. The dataset (Huang et al. 2001) is publicly available at http://thinker.med.ohio-state.edu.

In total, the three analyzed datasets comprised 57 PTCs, 61 benign thyroid lesions, and 62 apparently normal thyroid tissues analyzed on 180 GeneChips of two different generations. Half of them were U133A and the rest U95A platforms.

Data pre-processing and generation of datasets

Each dataset was pre-processed by the MAS5 algorithm. To compare the expression data generated using the U95A GeneChips (12 625 probe sets) with those from the U133A GeneChips (22 283 probe sets), we used the ‘Human Genome U95 to Human Genome U133 Best Match Comparison Spreadsheet’ (www.affymetrix.com/support/technical/comparison_spreadsheets.affx) which yielded an intersection of 9530 probe sets. The obtained data were log2 transformed.

Neighborhood analysis and recursive elimination in gene selection

For selection of gene sets with diagnostic potential, we applied here the recursive feature elimination (RFE) algorithm (Guyon et al. 2002) which is computationally less demanding than recursive feature replacement used in our previous studies (Jarzab et al. 2005, Eszlinger et al. 2006). The introductory gene selection was performed using neighborhood analysis (200 genes; Golub et al. 1999, Slonim et al. 2000), further selection of the 100 best genes set was carried out by RFE.

SVMs and classification

The linear SVM (Boser et al. 1992, Vapnik 1995) was used for developing the classification rule. As mentioned earlier, the classifier was independently trained for different numbers of selected genes (from 1 to 100).

Bootstrap for estimation of classifier accuracy and its CI

In order to determine the accuracy of the developed classifier, we performed classical bootstrap procedure in 500 resampling iterations (selection with equal probability and return of samples; Efron 1979). Iterations of all stages of the classifier construction (i.e. gene preselection, gene selection, and classifier learning) were performed in each bootstrap, as suggested previously (Simon et al. 2003). The accuracy of the classifier was calculated using the 0.632 bootstrap estimator (Efron 1983). The distribution of the misclassification rate obtained during all bootstrap runs was used to estimate the 95% CI. The accuracy of the classifier and the CI were calculated for different numbers of selected genes (up to 100).

Bootstrap based feature ranking (BBFR) and outlier detection (BBOD)

The primary purpose of the bootstrap used in this study was to estimate the accuracy of the molecular classifier for different sizes of gene subsets with appropriate CIs. However, the computational effort for the bootstrap technique may also be exploited to derive some additional information. We apply two methods that use the information collected during bootstrapping: BBFR and BBOD. They are similar to the methods of statistical learning based on resampling, such as bagging and boosting. In both techniques, an ensemble of many base classifiers is created. Each base classifier is trained on different bootstrap subsamples. The final decision is based on decisions of all base classifiers. The simplest approach is bagging (bootstrap aggregating) originally proposed by Breiman (1996). In bagging, the subsamples are randomly drawn as in classical bootstrapping where each observation is picked with the same probability 1/m, where m is the number of all observations. The final decision is the decision of most base classifiers. In boosting, different observations may be picked with different probability and the final decision is weighted sum of decisions of base classifiers. The well-known boosting algorithm is AdaBoost (Freund & Schapire 1996).

In our approach, we do not create an ensemble (committee) of many base classifiers but we use the information collected during bootstrap-based validation step of the SVM classifier.

Let the data contain m instances (observations). One instance is a vector of Nmax features (gene expression values) with a corresponding class label specified by an expert. Let LB be the number of bootstrap iterations. In each run, we select (with equal probability and return of samples) m instances from the dataset (bootstrap sample). Then, the bootstrap sample is used for feature selection and classifier learning. Finally, the classifier is tested on the test set containing all instances not belonging to the bootstrap sample.

To find the optimal size for the feature set, we select N feature sets {Omega}1,{Omega}2,...,{Omega}N of sizes 1,2,...,N respectively. In general, selected sets may not overlap, but in most commonly used feature selection methods, based on feature ranking or backward/forward searching, feature subsets satisfy the relation


Formula 1(1)

BBFR

Let rj(i) be a number of subsets {Omega}i, i = 1,2,...,N where the gene j belongs to. For gene selection methods satisfying equation (1), we have


Formula 2(2)

The BBFR score Rj of the feature j is defined as a sum of rj(i) over all bootstrap runs as follows:


Formula 3(3)

The maximum possible value of the BBFR score is LBN.

BBOD

Let qk be the number of bootstrap iterations where the observation k is chosen as a test instance (not a member of the bootstrap sample). Let qktrue be the number of bootstrap iterations where the instance j is correctly classified at the test stage.

The BBOD score for k-th observation is


Formula 4(4)

The value of Qk belongs to the interval <0,1> and the low value indicates outliers.

Comparison of different class prediction methods

We used BRB ArrayTools (developed by Dr Richard Simon and Amy Peng Lam) to compare different class prediction algorithms (Compound Covariate Predictor, Linear Diagonal Discriminant Analysis, Nearest Centroid, 1-Nearest Neighbor, 3-Nearest Neighbors and SVMs). To compute misclassification rate, 0.632 bootstrap cross-validation method was used. All genes with univariate misclassification rate below 0.2 were used for this analysis.


    Results
 Top
 Abstract
 Introduction
 Material and methods
 Results
 Discussion
 References
 
Accuracy of malignant/benign classification and redundancy of PTC gene classifiers

The huge difference in gene expression between PTC and benign/normal thyroid tissues implies that many multigene classifiers with similar classification ability may be created. For preliminary assessment of accuracy of the differentiation between PTC and benign lesions or normal thyroid, we randomly divided the 180-array dataset into two subgroups, according to sample number: A (odd numbers) and B (even numbers). Each subgroup contained data from similar number of benign and malignant tumor specimens analyzed with U133A or U95A GeneChips. We used set A to obtain a 20-gene classifier; this classifier was tested on set B and the procedure was repeated, using set B as a training set and testing the classifier on set A. Using the classifier obtained from set A, we were able to correctly predict 86 out of 90 samples (95.6%) within set B, while using the classifier obtained from set B, we accurately diagnosed 88 out of 90 samples in set A (97.8%). Both classifiers differed partly from our previous 20-gene classifier (37) obtained on a smaller dataset.

To avoid a bias in gene selection and accuracy estimation, related to the arbitrary selection of the training set, we carried out the procedure of accuracy estimation by bootstrapping, i.e. randomly selecting large numbers of slightly different training sets and validating them on the remaining samples. This procedure allows using sufficiently large training sets while simultaneously obtaining a reliable estimation of classification accuracy. By applying this method, we estimated the accuracy of discrimination between benign and malignant samples to be 98.6%, with a rather narrow CI (see Fig. 1Go). For small gene sets, the accuracy was a bit lower (93.7% for one-gene set, 96.9% for two-gene set, 97.9 for three-gene set, and from 98.3 to 98.6 for larger sets, up to n = 100). For the 20-gene classifier, the accuracy was 98.5% and the estimated 95% CI was 95.9–100% for the classifiers built from more than five genes.


Figure 1
View larger version (15K):
[in this window]
[in a new window]

 
Figure 1 Accuracy of bootstrapping-estimated benign–malignant classification for different gene set sizes. The 95% confidence interval is marked by dashed lines.

 
We compared the results of classification by the best 500 genes (Fig. 2Go) with the classification by consecutive 500-gene sets (i.e. first 500, 500–1000, 1000–1500, etc). We noted that only the first 500 genes allow accurately classifying samples by single genes or small gene sets. Genes ranked 500–1000 achieved 90% accuracy only for classifiers larger than 50 genes, while genes beyond the first 1000 hardly achieve this limit of accuracy. When we excluded all genes analyzed in Fig. 2Go (8 x 500 = 4000), the accuracy obtained for small sets was only ~60%, close to random. However, the accuracy rose with gene set size, and for classifier sets larger than 700 genes it achieved 90% (data not shown). These results support the conclusion that the PTC transcriptome differs from the normal one in thousands of genes; they also provide evidence that optimizing a diagnostic gene set is a necessary step of analysis in order to make this set useful for molecular PTC classification.


Figure 2
View larger version (24K):
[in this window]
[in a new window]

 
Figure 2 Accuracy of classification obtained by successive gene set reduction. The accuracy of the best 500 genes was evaluated in one iteration using the bootstrap technique, then the selected 500-gene set was removed from the whole dataset, and the next 500 genes were selected in the following iteration. This procedure was repeated seven times, thus 3500 genes were excluded (line no. 8). To speed up the procedure, only neighbourhood analysis (NA) was used for gene selection.

 
Ranking of PTC genes for their classification ability

To obtain the ranking of genes based on their usefulness in the diagnostic context, we performed subsequent repetitive gene selection process by bootstrapping of the whole dataset. We ranked all genes according to the frequency of appearance within the selected gene sets (BBFR). Genes important for the majority of diagnostic datasets were highly ranked, while less importance was given to complementing transcripts, which exhibited higher variability (Fig. 3Go). During the selection process, 365 transcripts occurred at least once within the obtained classifiers and some of them were present in nearly all classifiers. The maximum theoretical score to be obtained by a gene was 5 x 104 and the gene with the best rank, encoding metallophosphoesterase domain-containing protein 2 (MPPED2), had a score of 4.84 x 104, i.e. 96.7% of the maximum one. The first 20 genes were given scores > 3.74 x 104 ( > 77% of the maximum score), only slightly lower than the top gene, and the first 100 transcripts were characterized by scores > 0.64 x 104, which is > 13.2% of the maximum score obtained. In total, 43 transcripts representing 41 genes scored higher than half of the value for the top gene ( > 2.42 x 104, Fig. 3Go). Among them, there were both genes known for their changed expression in PTC or described in previous microarray studies, some used already as single markers, as well as new genes, not considered previously for their diagnostic potential (Table 1Go).


Figure 3
View larger version (17K):
[in this window]
[in a new window]

 
Figure 3 Result of bootstrap-based feature ranking (BBFR). Each dot represents one gene, dashed lines define the subset of 43 genes with BBFR score larger than half of the maximum one (black dots).

 

View this table:
[in this window]
[in a new window]

 
Table 1 Ranking of papillary thyroid cancer (PTC) genes as assessed by bootstrap-based feature ranking (BBFR) approach. For each transcript selected, rank and score obtained by the BBFR method are given, together with basic univariate statistics (log2 mean and log2 ratio)
 
We analyzed fold-change differences between PTC and benign thyroid samples for the 43 selected transcripts to evaluate the potential influence of inter-platform differences on the obtained gene selection. Twenty of them showed more than fourfold increase (log ratio > 2) and four transcripts were increased more than twice, whereas the remaining 19 transcripts were decreased. Generally, the consistency between fold-changes observed in subsets from U95 and U133 arrays was good, although for some genes (e.g. the well-known thyroid cancer markers fibronectin 1 (FN1) and MET or novel genes cadherin 16 (CDH16) or gap junction protein ß-3 (GJB3)) there were inter-platform differences between the log ratios. However, 40 out of 43 selected genes exhibited more than twofold change in both the U133 and the U95 subsets. For all 43 genes, the PTC–benign difference was larger than the difference between fold-changes obtained with different GeneChip generation subsets. This confirms that the selection performed was robust to inter-array differences.

Misclassified thyroid samples

The algorithm with bootstrapping allows ranking the samples according to the frequency of their misclassification (Table 2Go). BBOF showed very frequent misclassifications for two samples. One of them was not properly classified by any gene set selected, and this was sample no. 154 from the U133 dataset no. 1, a small (10 mm in diameter) familial PTC found within a larger follicular adenoma. It was observed in an 18-year-old woman. A year later her mother, 43 years old, was diagnosed with 0.7 cm PTC (follicular variant). The other one, properly classified only in 8% of runs, was a benign follicular adenoma (diagnosed as atypical) from the same dataset (sample no. 97) which was derived from a 15-year-old boy of another family with familial PTC. In this family, there were two PTC cases (mother of the patient, diagnosed with pT2BNxM0 PTC and her aunt who died of a dissemination of PTC) and one follicular thyroid cancer case (pT2bNxM0, 11 years old, sister of the patient). These were the only two cases with a positive family history of thyroid cancer among 49 Polish patients included in the study. Two further samples were properly classified in 65–68% of runs (one from dataset no. 1 and one from dataset no. 3), again one benign adenoma and one PTC, respectively. For the fifth sample, the accuracy was much higher and it was properly classified in 88% of the runs. Thus, only 5 out of 180 samples (2.8%) were misclassified in more than 10% of the runs, while a total of 14 samples (7.8%) were misclassified in more than 1% of the runs. Seventy samples were classified with an excellent accuracy between 99 and 100%, and for further 64 cases no misclassification occurred during the bootstrapping process.


View this table:
[in this window]
[in a new window]

 
Table 2 Ranking of thyroid samples by tumor–normal misclassification frequency, assessed by bootstrap-based outlier detection (BBOD) approach. The BBOD rank and score Qk, as defined in Material and methods, is given
 
Comparison of classification accuracy by different class prediction methods

To evaluate our method, we compared the accuracy of prediction by different class prediction methods implemented in BRB-Array software. We based the class prediction on all genes that showed the univariate misclassification rate lower than 20%. We found out that the classification accuracy ranged from 89% (compound covariate predictor method) to 99% (SVM), and confirmed the best performance of SVM-based methods to analyze these data (Table 3Go).


View this table:
[in this window]
[in a new window]

 
Table 3 Comparison of results obtained by different class prediction methods
 

    Discussion
 Top
 Abstract
 Introduction
 Material and methods
 Results
 Discussion
 References
 
Transcripts important for discriminating PTC from benign and normal thyroid samples

In the study, we performed an advanced optimization of putative PTC markers using a large group of benign thyroid lesions and normal thyroid tissues and proposed a list of 43 transcripts, selected by their most frequent appearance in the classifiers. An additional proof of their efficacy was obtained by hierarchical clustering (all samples clustered correctly, data shown in the web appendix to this article, www.genomika.pl/thyroidcancer). Forty-one of them (95.3%) could be attributed to 39 known genes, 32 well-defined ones, and 7 of unknown or not well-defined function. There were 12 genes which had never before been related to the thyroid gland nor mentioned in genomic studies of thyroid cancer, while 29 genes (74%) were identified in previous thyroid microarray studies. However, only ten of them were discussed in the original papers for their putative role in thyroid carcinoma. Within the list of the well-known genes which received high scores by BBFR, one should mention gene encoding FN1, met proto-oncogene (MET; both scored 4.4 x 104), dipeptidylpeptidase 4 (DPP4), adenosine A1 receptor (ADORA1), keratin 19, and B-cell CLL (BCL2) genes (Huang et al. 2001, Wasenius et al. 2003, Baris et al. 2004, Chevillard et al. 2004, Finley et al. 2004a, Wreesmann et al. 2004, Giordano et al. 2005), all up-regulated with the exception of BCL2. Their inclusion in our classifier positively validates the applied criteria. All these genes except ADORA1 were previously found by single gene studies (see Table 1Go) and later confirmed by microarray approaches. Moreover, in the recent meta-analysis of thyroid cancer gene expression profile, MET and FN1 were included into top 12 candidates for consistent gene expression markers (Griffith et al. 2006). Similarly, thyroid-specific (down-regulated) genes, deiodinase, iodothyronine, type I and thyroid peroxidase, were widely recognized previously for their diagnostic significance both in microarray-based (Eszlinger et al. 2001, Huang et al. 2001, Baris et al. 2004, Cerutti et al. 2004, Finley et al. 2004a, Wreesmann et al. 2004) and single gene studies (Arturi et al. 1997, Lazar et al. 1999, De Micco et al. 1999, Czarnocka et al. 2001, Le Fourn et al. 2004, Ambroziak et al. 2005, Arnaldi et al. 2005). Nevertheless, neither our approach nor the meta-analysis mentioned earlier indicated other thyroid-specific genes, confirming the lesser diagnostic potency of sodium iodide symporter, thyroglobulin, thyrotrophin receptor, or thyroid-specific transcription factors, shown to be down-regulated in previous single gene studies (Arturi et al. 1997, Lazar et al. 1999, Shimura et al. 2001, Scouten et al. 2004, Ambroziak et al. 2005, Wagner et al. 2005).

The top gene identified by our effort, MPPED2, which is lost in PTC, was not previously considered for its role in PTC, although it was previously listed by Aldred et al. (2004, in the context of FTC) and by Mazzanti et al.(2004). It is an ancient gene highly conserved from Caenorhabditis elegans to mammals and expressed in fetal brain. Its function is unknown.

Already the first microarray-based analysis of a PTC gene expression profile (Huang et al. 2001) indicated the dominant position of genes controlling cell–matrix adhesion and cell–cell communication. Besides, FN1 mentioned earlier, and intercellular adhesion molecule 1 (ICAM-1; Kawai et al. 1998), it seems important to mention syndecan 4 (SDC4), a transmembrane heparan sulfate proteoglycan known to bind FN1 and functioning also as CXCL12 receptor in signal transduction (Huang et al. 2001, Chevillard et al. 2004, Finley et al. 2004a). Loss of CDH16 (kidney-specific cadherin; Thomson et al. 1998) was indicated for the first time in our study, a gene closely related to cadherin E (CDH1), which is well known to be lost in a subgroup of PTCs with negative prognostic significance (Rocha et al. 2003), while cadherin P (CDH3) is up-regulated in PTC (Jarzab et al. 2005). Other genes involved in cell adhesion and present in our list comprise ectonucleoside triphosphate diphosphohydrolase 1 (ENTPD1) (up-regulated) and less known genes such as NEL-like 2 (up-regulated) and sushi, nidogen, and EGF-like domains 1 (down-regulated), both exhibiting EGF-like repeats (Watanabe et al. 1996). The GJB3 gene (connexin 31) encodes the protein subunit of gap junctions, essential for cell–cell communication.

DPP4 (CD26), ICAM1, and ENTPD1 (CD39) may be considered as immune-related genes, although their expression is not confined to immune or endothelial cells. ICAM1 was shown to be present in thyroid cancer cells (Kawai et al. 1998). ENTPD1 (ecto-ATPase), in turn, has not been described before for the thyroid gland; its expression was shown in some other organs like salivary glands or exocrine pancreas (Kittel et al. 2004). It converts adenine nucleotides to adenosine, thus participating in the control of signal transduction. DPP4, another membrane-bound enzyme which hydrolyzes peptides engaged in paracrine and autocrine regulation, is up-regulated in PTCs both on RNA and protein level (Huang et al. 2001, Kholova et al. 2003). The contribution of various enzymes to our list is striking: others, not described previously in the context of thyroid gland, comprise UDP-galactose epimerase (GALE) and glutaminyl-peptide cyclotransferase (QPCT), both with virtually unknown expression patterns. The latter was also indicated by the meta-analysis of Griffith et al. Among gene encoding enzymes lost in PTC are plasma glutamate carboxypeptidase, plasma glutamate carboxypeptidase (Gingras et al. 1999), not mentioned in any thyroid-related study before; carbonic anhydrase 4 (CA4), and even the well-known homogentisate oxidase (encoding HGD), not previously related to the thyroid in any context, although listed in many microarray-based reports (Table 1Go).

Underexpression of hemoglobin transcripts (HBA1/A2 and HBB scored at positions 2 and 25 respectively) was already discussed in our papers as a very characteristic feature of PTC gene expression profile (Jarzab et al. 2005). We believe that the down-regulation of hemoglobin gene could be associated with tumor hypoxia; HBA has also been considered a tumor suppressor since transduction of this gene in an anaplastic thyroid cancer cell line induces an anti-proliferative effect (Onda et al. 2005).

Many of the genes listed in Table 1Go participate in signal transduction; among them are MET, ADOR-A1,RAB27A as well as tumor-associated calcium signal transducer 2, inositol 1,4,5-triphosphate receptor, type 1 (ITPR1), ryanodine receptor 1, all up-regulated in PTC except for ITPR1. Some enzymes mentioned above (DPP4, ENTPD1, and QPCT) contribute to synthesis or breakdown of signaling molecules. On the other hand, the list also includes many genes participating in transcription regulation, among them high-mobility group AT-hook 2, aryl hydrocarbon receptor, retinoid X receptor, {gamma}, ID3, nuclear receptor-interacting protein 1, and RUNX1. Both of these functional classes are typical for cancer genes. We noted only one gene clearly related to apoptosis (and lost in PTC), the well-known BCL2. Interestingly enough, some immunohistochemical studies report its up-regulation in PTC (Aksoy et al. 2005).

Although the selected genes were obtained by analysis of PTC, many of them may be found also in other types of thyroid tumors (M Oczko-Wojciechowska, J Starzynski, M Jarzab, Z Wygoda, A Czarniecka, G Gala, M Kalemba, E Gubala & B Jarzab, unpublished data). This is convincingly illustrated by the overlapping results of our analysis and one of the studies which dealt with follicular thyroid tumors only (Barden et al. 2003).

Accuracy of discriminating PTC from benign/normal thyroid tissue

Our study is the first to define the classification accuracy for thyroid cancer by 95% CIs and one of the few dealing with the problem of diagnostic accuracy of microarray-derived classifiers (Kerr & Churchill 2001). Although the estimation of CIs by Monte Carlo analysis has not gained a general acceptance still, it is necessary to stress the very good accuracy of PTC diagnosis in our study with the lower range of the CI at 95%, obtained using a sufficiently large study group, mimicking the real clinical setting. From a clinical point of view, for a PTC classifier, an even higher accuracy is required, as the risk of diagnosing PTC in a thyroid nodule is only about 5% (Hegedus 2004).

Our results stress the importance of multi-gene approaches for the molecular diagnosis of cancer. We observed that lower limits of accuracy CIs were decreased in case of classification by gene sets with less than ten genes. The initial conclusion from these data is that any combination of more than five to ten genes increases the reliability of distinguishing between malignant and benign tissue samples. This result is similar to that obtained by Hua et al.(2005), who demonstrated on simulated and real breast cancer data that for different classifiers the number of features lower than five was usually much less effective than larger classifiers. Recent paper reports a six-gene molecular classifier, efficient for molecular diagnosis of thyroid cancer (Kebebew et al. 2006).

Bootstrap-based multi-gene classification of PTC microarray data

Selection of genes is an important goal of microarray studies contributing to broader understanding of the cancer transcriptome as well as yielding novel molecular cancer markers. Such studies have been successfully performed in PTC and large numbers of discriminating physiologically relevant genes were proposed (Huang et al. 2001, Wasenius et al. 2003, Aldred et al. 2004, Chevillard et al. 2004, Finley et al. 2004a,b, Wreesmann et al. 2004, Baris et al. 2005, Detours et al. 2005, Giordano et al. 2005). However, in the majority of these studies, the selection of important genes was based on either fold-change or significance criteria obtained using classical statistical tests. These approaches either favor genes with large amplitudes, sometimes coming from a minor proportion of samples, or genes with low within-group variance, thus rather stably expressed in all analyzed tumor samples. Bearing in mind, complexity of molecular changes in tumors, the widespread skepticism about a single ‘cancer marker’ as well as possible differences in histological subtypes or other features of PTC, we decided to use SVM, a routine machine-learning approach to construct classifiers based on multiple features of the analyzed objects. This method allows integrating the information carried by many genes in the gene sets. Thus, effective molecular multi-gene classifiers may be built that rely on inter-gene interactions rather than on combining single ‘best markers’. SVMs have been confirmed as an effective method of multi-gene set selection and this is supported by our comparison to other class prediction methods. Our procedure helps us to optimize the list of markers which are to be implemented to real-time quantitative PCR-supported fine needle biopsy (Lubitz & Fahey 2006).

From the diagnostic point of view, the major drawback of the SVM-based methods are the fluctuations of gene content between classifiers of different size or based on slightly different training sets. To overcome this problem, we extended the original algorithm with bootstrap iterations, as recommended (Braga-Neto & Dougherty 2004). A bootstrap iteration depends on creating a temporary learning set (bootstrap sample) by performing selection from the original set with return of samples. Then, the classification rule is derived based on a bootstrap sample and applied to the rest of the original set. Multiple selections of slightly different training sets represent the variability, which may be observed between different thyroid cancer collections, laboratories, etc. Indeed, our current data generated using the bootstrap technique show much better agreement with the results of other thyroid cancer studies (Oczko-Wojciechowska et al. submitted) than data created by leave-one-out cross-validation of the whole dataset (Jarzab et al. 2005).

Originally, in a bootstrap iteration one counts only the number of misclassifications. Since in all bootstrap iterations every step of data processing (gene selection and classifier training) has to be repeated (Simon et al. 2003), some additional knowledge can be gained. The procedure used by us enables ranking of genes which are most often present in the classifiers obtained from the different subsets of the training set (BBFR). Furthermore, it also estimates the accuracy with appropriate CIs. Moreover, it allows ranking the samples according to the frequency of misclassifications (BBOD). The use of BBFR resulted in delineation of genes, which were either novel or not recognized before for their contribution to the PTC gene expression profile, even if they were included in the large gene lists given in previous genomic studies. BBOD allowed us to reveal ‘difficult’ samples in the analyzed group. The two thyroid samples with the poorest accuracy of diagnosis were derived from patients with familial thyroid tumors, which suggest that their gene expression profiles may differ from sporadic ones. For the remaining samples, in 175 out of 180 cases ( > 97%) the percentage of correct diagnoses was > 90%.

Recently, Zhang et al.(2006) have published a SVM-based recursive method of gene selection. This method, called R-SVM, differs from the standard RFE algorithm, used here, in modified criteria applied in elimination steps. Moreover, the final gene subset is created on the basis of any resample method used at the validation stage, which is similar to our approach presented here. Nevertheless, our bootstrap-based method allows detecting outlier samples and provides the estimation of CIs for the classification accuracy, which is much more informative than the accuracy estimator alone.

PTC and normal/benign difference versus inter-platform difference

To assure a sufficient number of tissue samples, it was necessary to combine data obtained using different generations of GeneChips, which cannot be compared by a direct approach (Eszlinger et al. 2006). The use of multi-gene classifiers allows, however, overcoming this difficulty. We showed earlier that the classifier selected using the U133 platform (Jarzab et al. 2005) performs well on U95-obtained data and has high classification accuracy (Eszlinger et al. 2006). In the present paper, we demonstrate that it is possible, after correctly matching genes from two different generation microarrays, to derive an efficient multi-gene classifier. When we included both benign and malignant samples from both platforms, the vast majority of these samples were properly classified. Using Affymetrix GeneChips, Barden et al.(2003) and Finley et al. (2004a,b) had previously reported 20 of 43 genes now confirmed by us as diagnostically relevant for PTC. This is a level of agreement rarely noted for inter-group comparisons of microarray results.

Our analysis has been performed on microarray data pre-processed by the standard MAS5 algorithm. Although many authors demonstrate the superiority of other pre-processing methods (e.g. RMA or GC-RMA; Irizarry et al. 2003), for inter-platform comparisons, the MAS5 method still seems to be a reasonable approach. In the MAS5 algorithm, each array is processed independently and the bootstrap procedure does not have to involve this step. Use of RMA pre-processing, which has to operate on the whole dataset, would pose the question of whether this step should also be bootstrapped. Presently, this is not feasible due to huge computational demand of preprocessing for large sample sets.

Redundancy of multi-gene cancer classifiers

This is inherently linked to the huge differences in gene expression profiles of several tumors, originating from the same tissue. This was indicated for the first time by Ein-Dor et al.(2005) in breast cancer. These authors re-analyzed the data of van’t Veer et al.(2002) and showed that multiple similar classifiers may be obtained; they have comparable classification potency as van’t Veer’s original 70-gene classifier but a different gene content. Ein-Dor et al. stressed also that even slight differences in the training set composition influenced the selected genes. Our analysis demonstrates that similar redundancy is present in PTC. This fact is frequently overlooked by authors interpreting the results of gene expression profile studies that involved only a few genes or which were obtained in small groups of patients. In this paper, we propose a method of ranking genes according to their importance in multi-gene classifiers and with appropriate CIs indicating the robustness of the result.

To conclude, the primary goal of this study was to validate a novel SVM-based approach to differentiation of PTC from benign thyroid lesions. This goal was achieved with a very satisfactory degree of accuracy, over 95%. Simultaneously, we were able to rank the genes most essential for the molecular diagnosis of PTC. Although the presented list of genes can be enlarged, we believe the first 40 genes are especially suitable for further prospective studies in fine needle biopsy material and may serve to construct multi-gene classifiers with potential application in clinical setting. The comparison with other published microarray studies yields sufficient validation for the vast majority of them.


    Acknowledgements
 
We gratefully acknowledge Aleksander Sochanik, PhD, for the thorough language revision of the manuscript. This work was partially supported by Polish Ministry of Education and Science under grant 3T11A 019 29 (K F) and 2P05A 022 30 (B J). This work was partially supported by the Deutsche Krebshilfe grant 106542 (R P and K K) and the Interdisciplinary Center for Clinical Research at the Faculty of Medicine of the University of Leipzig (projects B20, Z03). This work was partially supported within GENRISK-T project, contract number 036495 (A S, B J). Authors declare no potential conflict of interest.


    References
 Top
 Abstract
 Introduction
 Material and methods
 Results
 Discussion
 References
 
Aksoy M, Giles Y, Kapran Y, Terzioglu T & Tezelman S 2005 Expression of bcl-2 in papillary thyroid cancers and its prognostic value. Acta Chirurgica Belgica 105 644–648.[ISI][Medline]

Aldred MA, Ginn-Pease ME, Morrison CD, Popkie AP, Gimm O, Hoang-Vu C, Krause U, Dralle H, Jhiang SM, Plass C et al. 2003 Caveolin-1 and caveolin-2, together with three bone morphogenetic protein-related genes, may encode novel tumor suppressors down-regulated in sporadic follicular thyroid carcinogenesis. Cancer Research 63 2864–2871.[Abstract/Free Full Text]

Aldred MA, Huang Y, Liyanarachchi S, Pellegata NS, Gimm O, Jhiang S, Davuluri RV, de la Chapelle A & Eng C 2004 Papillary and follicular thyroid carcinomas show distinctly different microarray expression profiles and can be distinguished by a minimum of five genes. Journal of Clinical Oncology 22 3531–3539.[Abstract/Free Full Text]

Ambroziak M, Pachucki J, Stachlewska-Nasfeter E, Nauman J & Nauman A 2005 Disturbed expression of type 1 and type 2 iodothyronine deiodinase as well as titf1/nkx2-1 and pax-8 transcription factor genes in papillary thyroid cancer. Thyroid 15 1137–1146.[CrossRef][ISI][Medline]

Aratake Y, Nomura H, Kotani T, Marutsuka K, Kobayashi K, Kuma K, Miyauchi A, Okayama A & Tamura K 2006 Coexistent anaplastic and differentiated thyroid carcinoma: an Immunohistochemical Study. American Journal of Clinical Pathology 125 399–406.[CrossRef][ISI][Medline]

Arnaldi LA, Borra RC, Maciel RM & Cerutti JM 2005 Gene expression profiles reveal that DCN, DIO1, and DIO2 are underexpressed in benign and malignant thyroid tumors. Thyroid 15 210–221.[CrossRef][ISI][Medline]

Arturi F, Russo D, Giuffrida D, Ippolito A, Perrotti N, Vigneri R & Filetti S 1997 Early diagnosis by genetic analysis of differentiated thyroid cancer metastases in small lymph nodes. Journal of Clinical Endocrinology and Metabolism 82 1638–1641.[Abstract/Free Full Text]

Baloch ZW & Livolsi VA 2002 Follicular-patterned lesions of the thyroid: the bane of the pathologist. American Journal of Clinical Pathology 117 143–150.[CrossRef][ISI][Medline]

Barden CB, Shister KW, Zhu B, Guiter G, Greenblatt DY, Zeiger MA & Fahey TJ III 2003 Classification of follicular thyroid tumors by molecular signature: results of gene profiling. Clinical Cancer Research 9 1792–1800.[Abstract/Free Full Text]

Baris O, Savagner F, Nasser V, Loriod B, Granjeaud S, Guyetant S, Franc B, Rodien P, Rohmer V, Bertucci F et al. 2004 Transcriptional profiling reveals coordinated up-regulation of oxidative metabolism genes in thyroid oncocytic tumors. Journal of Clinical Endocrinology and Metabolism 89 994–1005.[Abstract/Free Full Text]

Baris O, Mirebeau-Prunier D, Savagner F, Rodien P, Ballester B, Loriod B, Granjeaud S, Guyetant S, Franc B, Houlgatte R et al. 2005 Gene profiling reveals specific oncogenic mechanisms and signaling pathways in oncocytic and papillary thyroid carcinoma. Oncogene 24 4155–4161.[ISI][Medline]

Basolo F, Fiore L, Fusco A, Giannini R, Albini A, Merlo GR, Fontanini G, Conaldi PG & Toniolo A 1999 Potentiation of the malignant phenotype of the undifferentiated ARO thyroid cell line by insertion of the bcl-2 gene. International Journal of Cancer 81 956–962.[CrossRef][ISI][Medline]

Belfiore A, Gangemi P, Costantino A, Russo G, Santonocito GM, Ippolito O, Di Renzo MF, Comoglio P, Fiumara A & Vigneri R 1997 Negative/low expression of the Met/hepatocyte growth factor receptor identifies papillary thyroid carcinomas with high risk of distant metastases. Journal of Clinical Endocrinology and Metabolism 82 2322–2328.[Abstract/Free Full Text]

Berlingieri MT, Pierantoni GM, Giancotti V, Santoro M & Fusco A 2002 Thyroid cell transformation requires the expression of the HMGA1 proteins. Oncogene 21 2971–2980.[CrossRef][ISI][Medline]

Boser B, Guyon I & Vapnik V 1992 A training algorithm for optimal margin classifiers. Fifth Annual Workshop on Computational Learning Theory, Pittsburgh.

Braga-Neto U & Dougherty E 2004 Is cross-validation valid for small sample microarray classification? Bioinformatics 20 374–380.[Abstract/Free Full Text]

Breiman L 1996 Bagging predictors. Machine Learning 24 123–140.[ISI]

Cerutti JM, Delcelo R, Amadei MJ, Nakabashi C, Maciel RM, Peterson B, Shoemaker J & Riggins GJ 2004 A preoperative diagnostic test that distinguishes benign from malignant thyroid carcinoma based on gene expression. Journal of Clinical Investigation 113 1234–1242.[CrossRef][ISI][Medline]

Chen KT, Lin JD, Chao TC, Hsueh C, Chang CA, Weng HF & Chan EC 2001 Identifying differentially expressed genes associated with metastasis of follicular thyroid cancer by cDNA expression array. Thyroid 11 41–46.[CrossRef][ISI][Medline]

Cherian MG, Jayasurya A & Bay BH 2003 Metallothioneins in human tumors and potential roles in carcinogenesis. Mutation Research 533 201–209.[Medline]

Chevillard S, Ugolin N, Vielh P, Ory K, Levalois C, Elliott D, Clayman GL & El-Naggar AK 2004 Gene expression profiling of differentiated thyroid neoplasms: diagnostic and clinical implications. Clinical Cancer Research 10 6586–6597.[Abstract/Free Full Text]

Czarnocka B, Pastuszko D, Janota-Bzowski M, Weetman AP, Watson PF, Kemp EH, McIntosh RS, Asghar MS, Jarzab B, Gubala E et al. 2001 Is there loss or qualitative changes in the expression of thyroid peroxidase protein in thyroid epithelial cancer? British Journal of Cancer 85 875–880.[CrossRef][ISI][Medline]

Dahl E, Winterhager E, Reuss B, Traub O, Butterweck A & Willecke K 1996 Expression of the gap junction proteins connexin31 and connexin43 correlates with communication compartments in extraembryonic tissues and in the gastrulating mouse embryo, respectively. Journal of Cell Science 109 191–197.[Abstract]

Detours V, Wattel S, Venet D, Hutsebaut N, Bogdanova T, Tronko MD, Dumont JE, Franc B, Thomas G & Maenhaut C 2005 Absence of a specific radiation signature in post-Chernobyl thyroid cancers. British Journal of Cancer 92 1545–1552.[CrossRef][ISI][Medline]

Efron B 1979 Bootstrap methods:another look at the jackknife. Annals of Statistics 7 1–26.[CrossRef][ISI]

Efron B 1983 Estimating the error rate of prediction rule: improvement on cross-validation. Journal of the American Statistical Association 78 316–331.[CrossRef][ISI]

Ein-Dor L, Kela I, Getz G, Givol D & Domany E 2005 Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 21 171–178.[Abstract/Free Full Text]

Eszlinger M, Krohn K & Paschke R 2001 Complementary DNA expression array analysis suggests a lower expression of signal transduction proteins and receptors in cold and hot thyroid nodules. Journal of Clinical Endocrinology and Metabolism 86 4834–4842.[Abstract/Free Full Text]

Eszlinger M, Krohn K, Frenzel R, Kropf S, Tonjes A & Paschke R 2004 Gene expression analysis reveals evidence for inactivation of the TGF-beta signaling cascade in autonomously functioning thyroid nodules. Oncogene 23 795–804.[CrossRef][ISI][Medline]

Eszlinger M, Wiench M, Jarzab B, Krohn K, Beck M, Lauter J, Gubala E, Fujarewicz K, Swierniak A & Paschke R 2006 Meta- and reanalysis of gene expression profiles of hot and cold thyroid nodules and papillary thyroid carcinoma for gene groups. Journal of Clinical Endocrinology and Metabolism 91 1934–1942.[Abstract/Free Full Text]

Fedele M, Pierantoni GM, Berlingieri MT, Battista S, Baldassarre G, Munshi N, Dentice M, Thanos D, Santoro M, Viglietto G et al. 2001 Overexpression of proteins HMGA1 induces cell cycle deregulation and apoptosis in normal rat thyroid cells. Cancer Research 61 4583–4590.[Abstract/Free Full Text]

Finley DJ, Arora N, Zhu B, Gallagher L & Fahey TJ III 2004a Molecular profiling distinguishes papillary carcinoma from benign thyroid nodules. Journal of Clinical Endocrinology and Metabolism 89 3214–3223.[Abstract/Free Full Text]

Finley DJ, Zhu B, Barden CB & Fahey TJ III 2004b Discrimination of benign and malignant thyroid nodules by molecular profiling. Annals of Surgery 240 425–436.[ISI][Medline]

Le Fourn V, Ferrand M & Franc JL 2004 Differential expression of thyroperoxidase mRNA splice variants in human thyroid tumors. Biochimica et Biophysic Acta 1689 134–141.

Franc B, De La Salmoniere P, Lange F, Hong C, Louvel A, De Roquancourt A, Wild F, Hejblum G, Chevret S & Chastang C 2003 Interobserver and intraobserver reproducibility in the histopathology of follicular thyroid carcinoma. Human Pathology 34 1092–1100.[CrossRef][ISI][Medline]

Freund Y & Schapire R 1996 Experiments with a new boosting algorithm. Proceedings of the 13th International Conference on Machine Learning Bari 325–332.

Frohlich E, Machicao F & Wahl R 2005 Action of thiazolidinediones on differentiation, proliferation and apoptosis of normal and transformed thyrocytes in culture. Endocrine-Related Cancer 12 291–303.[Abstract/Free Full Text]

Furuya F, Shimura H, Miyazaki A, Taki K, Ohta K, Haraguchi K, Onaya T, Endo T & Kobayashi T 2004 Adenovirus-mediated transfer of thyroid transcription factor-1 induces radioiodide organification and retention in thyroid cancer cells. Endocrinology 145 5397–5405.[Abstract/Free Full Text]

Ghinea N, Baratti-Elbaz C, De Jesus-Lucas A & Milgrom E 2002 TSH receptor interaction with the extracellular matrix: role on constitutive activity and sensitivity to hormonal stimulation. Molecular Endocrinology 16 912–923.[Abstract/Free Full Text]

Gingras R, Richard C, El-Alfy M, Morales CR, Potier M & Pshezhetsky AV 1999 Purification, cDNA cloning, and expression of a new human blood plasma glutamate carboxypeptidase homologous to N-acetyl-aspartyl-alpha-glutamate carboxypeptidase/prostate-specific membrane antigen. Journal of Biological Chemistry 274 11742–11750.[Abstract/Free Full Text]

Giordano TJ, Kuick R, Thomas DG, Misek DE, Vinco M, Sanders D, Zhu Z, Ciampi R, Roh M, Shedden K et al. 2005 Molecular classification of papillary thyroid carcinoma: distinct BRAF, RAS, and RET/PTC mutation-specific gene expression profiles discovered by DNA microarray analysis. Oncogene 24 6646–6656.[CrossRef][ISI][Medline]

Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al. 1999 Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286 531–537.[Abstract/Free Full Text]

Green LM, Bianski BM, Murray DK, Rightnar SS & Nelson GA 2005 Characterization of accelerated iron-induced damage in gap junction-competent and -incompetent thyroid follicular cells. Radiation Research 163 172–182.[CrossRef][ISI][Medline]

Griffith OL, Melck A, Jones SJ & Wiseman SM 2006 Meta-analysis and meta-review of thyroid cancer gene expression profiling studies identifies important diagnostic biomarkers. Journal of Clinical Oncology 24 5043–5051.[Abstract/Free Full Text]

Guyon I, Weston J, Barnhill S & Vapnik V 2002 Gene selection for cancer classification using support vector machines. Machine Learning 64 389–422.[CrossRef]

Hamada A, Mankovskaya S, Saenko V, Rogounovitch T, Mine M, Namba H, Nakashima M, Demidchik Y, Demidchik E & Yamashita S 2005 Diagnostic usefulness of PCR profiling of the differentially expressed marker genes in thyroid papillary carcinomas. Cancer Letters 224 289–301.[CrossRef][ISI][Medline]

Haugen BR, Larson LL, Pugazhenthi U, Hays WR, Klopper JP, Kramer CA & Sharma V 2004 Retinoic acid and retinoid X receptors are differentially expressed in thyroid cancer and thyroid carcinoma cell lines and predict response to treatment with retinoids. Journal of Clinical Endocrinology and Metabolism 89 272–280.[Abstract/Free Full Text]

Hegedus L 2004 Clinical practice. The thyroid nodule. New England Journal of Medicine 351 1764–1771.[Free Full Text]

Hoos A, Stojadinovic A, Singh B, Dudas ME, Leung DH, Shaha AR, Shah JP, Brennan MF, Cordon-Cardo C & Ghossein R 2002 Clinical significance of molecular expression profiles of Hurthle cell tumors of the thyroid gland analyzed via tissue microarrays. American Journal of Pathology 160 175–183.[Abstract/Free Full Text]

Hua J, Xiong Z, Lowey J, Suh E & Dougherty ER 2005 Optimal number of features as a function of sample size for various classification rules. Bioinformatics 21 1509–1515.[Abstract/Free Full Text]

Huang Y, Prasad M, Lemon WJ, Hampel H, Wright FA, Kornacker K, LiVolsi V, Frankel W, Kloos RT, Eng C et al. 2001 Gene expression in papillary thyroid carcinoma reveals highly consistent profiles. PNAS 98 15044–15049.[Abstract/Free Full Text]

Illario M, Amideo V, Casamassima A, Andreucci M, di MT, Miele C, Rossi G, Fenzi G & Vitale M 2003 Integrin-dependent cell growth and survival are mediated by different signals in thyroid cells. Journal of Clinical Endocrinology and Metabolism 88 260–269.[Abstract/Free Full Text]

Ippolito A, Vella V, La Rosa GL, Pellegriti G, Vigneri R & Belfiore A 2001 Immunostaining for Met/HGF receptor may be useful to identify malignancies in thyroid lesions classified suspicious at fine-needle aspiration biopsy. Thyroid 11 783–787.[CrossRef][ISI][Medline]

Irizarry R, Hobbs B, Colli F, Beazer-Barclay Y, Antonellis K, Scherf U & Speed T 2003 Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4 249–264.[Abstract]

Jacques C, Baris O, Prunier-Mirebeau D, Savagner F, Rodien P, Rohmer V, Franc B, Guyetant S, Malthiery Y & Reynier P 2005 Two-step differential expression analysis reveals a new set of genes involved in thyroid oncocytic tumors. Journal of Clinical Endocrinology and Metabolism 90 2314–2320.[Abstract/Free Full Text]

Jarzab B, Wiench M, Fujarewicz K, Simek K, Jarzab M, Oczko-Wojciechowska M, Wloch J, Czarniecka A, Chmielik E, Lange D et al. 2005 Gene expression profile of papillary thyroid cancer: sources of variability and diagnostic implications. Cancer Research 65 1587–1597.[Abstract/Free Full Text]

Kawai K, Resetkova E, Enomoto T, Fornasier V & Volpe R 1998 Is human leukocyte antigen-DR and intercellular adhesion molecule-1 expression on human thyrocytes constitutive in papillary thyroid cancer? Comparative studies in human thyroid xenografts in severe combined immunodeficient and nude mice Journal of Clinical Endocrinology and Metabolism 83 157–164.[Abstract/Free Full Text]

Kebebew E, Peng M, Reiff E & McMillan A 2006 Diagnostic and extent of disease multigene assay for malignant thyroid neoplasms. Cancer 106 2592–2597.[CrossRef][ISI][Medline]

Kehlen A, Lendeckel U, Dralle H, Langner J & Hoang-Vu C 2003 Biological significance of aminopeptidase N/CD13 in thyroid carcinomas. Cancer Research 63 8500–8506.[Abstract/Free Full Text]

Kerr MK & Churchill GA 2001 Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. PNAS 98 8961–8965.[Abstract/Free Full Text]

Kholova I, Ludvikova M, Ryska A, Topolcan O, Pikner R, Pecen L, Cap J & Holubec L Jr 2003a Diagnostic role of markers dipeptidyl peptidase IV and thyroid peroxidase in thyroid tumors. Anticancer Research 23 871–875.[ISI][Medline]

Kholova I, Ryska A, Ludvikova M, Cap J & Pecen L 2003b Dipeptidyl peptidase IV expression in thyroid cytology: retrospective histologically confirmed study. Cytopathology 14 27–31.[CrossRef][ISI][Medline]

Kiess M, Scharm B, Aguzzi A, Hajnal A, Klemenz R, Schwarte-Waldhoff I & Schafer R 1995 Expression of ril, a novel LIM domain gene, is down-regulated in Hras-transformed cells and restored in phenotypic revertants. Oncogene 10 61–68.[ISI][Medline]

Kim HS, Roh CR, Chen B, Tycko B, Nelson DM & Sadovsky Y 2007 Hypoxia regulates the expression of PHLDA2 in primary term human trophoblasts. Placenta 28 77–84.[CrossRef][ISI][Medline]

Kittel A, Csapo ZS, Csizmadia E, Jackson SW & Robson SC 2004 Co-localization of P2Y1 receptor and NTPDa-se1/CD39 within caveolae in human placenta. European Journal of Histochemistry 48 253–259.[ISI][Medline]

Klopper JP, Hays WR, Sharma V, Baumbusch MA, Hershman JM & Haugen BR 2004 Retinoid X receptor-gamma and peroxisome proliferator-activated receptor-gamma expression predicts thyroid carcinoma cell response to retinoid and thiazolidinedione treatment. Molecular Cancer Therapeutics 3 1011–1020.[Abstract/Free Full Text]

Kohrle J 1999 Local activation and inactivation of thyroid hormones: the deiodinase family. Molecular and Cellular Endocrinology 151 103–119.[CrossRef][ISI][Medline]

Kromminga A, Hagel C, Arndt R & Schuppert F 1998 Serological reactivity of recombinant 1D autoantigen and its expression in human thyroid and eye muscle tissue: a possible autoantigenic link in Graves’ patients. Journal of Clinical Endocrinology and Metabolism 83 2817–2823.[Abstract/Free Full Text]

Lazar V, Bidart JM, Caillou B, Mahe C, Lacroix L, Filetti S & Schlumberger M 1999 Expression of the Na+/I– symporter gene in human thyroid tumors: a comparison study with other thyroid-specific genes. Journal of Clinical Endocrinology and Metabolism 84 3228–3234.[Abstract/Free Full Text]

Lelievre V, Muller JM & Falcon J 1998 Adenosine modulates cell proliferation in human colonic adenocarcinoma. I. Possible involvement of adenosine A1 receptor subtypes in HT29 cells. European Journal of Pharmacology 341 289–297.[CrossRef][ISI][Medline]

Letsas KP, Frangou-Lazaridis M, Skyrlas A, Tsatsoulis A & Malamou-Mitsi V 2005 Transcription factor-mediated proliferation and apoptosis in benign and malignant thyroid lesions. Pathology International 55 694–702.[CrossRef][ISI][Medline]

Lin F, Ren XD, Doris G & Clark RA 2005 Three-dimensional migration of human