cDNA sequences for transcription factors and signaling proteins of the hemichordate Saccoglossus kowalevskii: efficacy of the expressed sequence tag (EST) approach for evolutionary and developmental studies of a new organism.
We describe a collection of expressed sequence tags (ESTs) for Saccoglossus kowalevskii, a direct-developing hemichordate valuable for evolutionary comparisons with chordates. The 202,175 ESTs represent 163,633 arrayed clones carrying cDNAs prepared from embryonic libraries, and they assemble into 13,677 continuous sequences (contigs), leaving 10,896 singletons (excluding mitochondrial sequences). Of the contigs, 53% had significant matches when BLAST was used to query the NCBI databases (< or = 10(-10)), as did 51% of the singletons. Contigs most frequently matched sequences from amphioxus (29%), chordates (67%), and deuterostomes (87%). From the clone array, we isolated 400 full-length sequences for transcription factors and signaling proteins of use for evolutionary and developmental studies. The set includes sequences for fox, pax, tbx, hox, and other homeobox-containing factors, and for ligands and receptors of the TGFbeta, Wnt, Hh, Delta/Notch, and RTK pathways. At least 80% of key sequences have been obtained, when judged against gene lists of model organisms. The median length of these cDNAs is 2.3 kb, including 1.05 kb of 3' untranslated region (UTR). Only 30% are entirely matched by single contigs assembled from ESTs. We conclude that an EST collection based on 150,000 clones is a rich source of sequences for molecular developmental work, and that the EST approach is an efficient way to initiate comparative studies of a new organism.