Structural similarities reveal an expansive conotoxin family with a two-finger toxin fold
Abstract
Venomous animals have evolved a diverse repertoire of toxins with considerable pharmaceutical potential. The rapid evolution of peptide toxins, such as the conotoxins produced by venomous marine cone snails, often complicates efforts to infer their evolutionary relationships based solely on sequence information. Structural bioinformatics, however, can provide robust support. Here, we first solve the NMR structure of a macro-conotoxin from the MLSML superfamily, Tx33.1, which is composed of 124 residues, including 12 cysteines. We then apply deep learning-based methods for structure prediction and comparison to identify structural similarities between this toxin and five additional, previously uncharacterized conotoxin superfamilies. Although only three of these superfamilies exhibit sequence homology, a combined approach incorporating structure prediction, structure comparison, and gene structure analysis supports the conclusion that all six superfamilies share a common evolutionary past. The Tx33.1 NMR structure displays similarity to the first two domains of Argos, a secretory protein from Drosophila melanogaster that comprises three domains, each harboring two short β-stranded loops (“fingers”). Consequently, we propose the name “two-finger toxin (2FTX)” fold for this type of domain. Finally, using structure similarity searches, we identify a wide range of 2FTX proteins in protostomes, including non-venom-derived, secretory cone snail proteins. This study demonstrates how structural bioinformatics can be employed to uncover evolutionary relationships among rapidly evolving genes. It simultaneously identifies a large, previously unrecognized group of protostome 2FTX proteins, many of which exhibit close structural similarity to Argos and may perform a similar function in regulating EGFR signaling.
Article type: Research Article
Keywords: Argos, cone snails, conotoxin, EGFR signaling, protein evolution, three-finger toxins, venom
Affiliations: Department of Biology, Linderstrøm-Lang Centre for Protein Science, University of Copenhagen, Copenhagen, Denmark; Department of Biochemistry, University of Utah, Salt Lake City, Utah, United States of America; Department of Biomedical Sciences, University of Copenhagen, Denmark; School of Biological Sciences, University of Utah, Salt Lake City, Utah, United States of America; MIRECC, George E. Whalen Veterans Affairs Medical Center, Salt Lake City, Utah, United States of America; Department of Psychiatry, University of Utah, Salt Lake City, Utah, United States of America; Mental Health Department, George E. Whalen Veterans Affairs Medical Center, Salt Lake City, Utah, United States of America
License: CC BY 4.0 This work is licensed under a Creative Commons Attribution 4.0 International License, which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
Article links: DOI: 10.1101/2025.07.03.662903 | PubMed: 40631153 | PMC: PMC12236594
Relevance: Moderate: mentioned 3+ times in text
Full text: PDF (40 KB)
Introduction
Animal venoms are rich in bioactive molecules that are used to kill prey and deter predators. Peptide toxins specifically target molecules such as ion channels, G-protein-coupled receptors, and transporters in their recipients. Due to high target specificity and potency, peptide toxins are valuable compounds that can be explored for use as research tools and potential drug leads. Currently, 11 venom-derived peptides originating from cone snails, snakes, leeches, and lizards have been approved for clinical applications (ref. Bordon, et al. 2020).
Often, toxin evolution involves duplication of a gene encoding a protein with an endophysiological function followed by the “recruitment” of the duplicated copy into venom in a process accompanied by neo- and/or sub-functionalization (ref. Jackson and Koludarov 2020). Small peptide toxins generally lack a classical hydrophobic core – instead, most are structurally stabilized by disulfide bonds involving conserved cysteine residues. This characteristic feature provides a stable scaffold that can evolve novel functional properties through variation on inter-cysteine positions, which correspond to surface-exposed residues often exhibiting high evolutionary rates driven by positive selection (ref. Woodward, et al. 1990).
Given the often high mutational rates of peptide toxins combined with their small size, inferring evolutionary histories based solely on sequences can be challenging. Because three-dimensional structure is more conserved through evolution than sequence (ref. Illergard, et al. 2009), structural analysis can aid in establishing distant evolutionary relationships (ref. Undheim, et al. 2016), e.g., between toxins and their ancestral endophysiological counterparts or among evolutionarily related toxins with highly diverged sequences. For instance, structural homology of the spider and centipede toxins Ta1a and Ssm6a revealed the convergent recruitment of these peptides from endophysiological peptides of the crustacean hyperglycemic hormone (CHH) family into venom (ref. Undheim, et al. 2015). Conversely, the AlphaFold (AF)-predicted structure of a newly identified cone snail toxin was used in a structure similarity search to identify the first endogenous molluscan peptide of the CHH family (ref. Koch, et al. 2023). In another example, we recently established an evolutionary relationship between the con-ikot-ikot and Mu8.1 conotoxins that have highly divergent mature sequences but similar three-dimensional and gene structures (ref. Hackney, et al. 2023).
Conotoxins represent a group of particularly fast-evolving gene products (ref. Duda and Palumbi 1999). Each of the approximately 1000 extant species of cone snails expresses hundreds of mainly disulfide-rich toxins. Conotoxins are produced as precursor sequences encoding a signal sequence for targeting to the endoplasmic reticulum (ER), often followed by a propeptide that is proteolytically removed upon exit from the ER, and finally, the mature sequence. Although larger toxins are also known, most characterized conotoxins are short (approx. 15–45 residues for the mature peptide region). Conventionally, conotoxins are grouped into superfamilies based on similarity in their signal sequence regions. To date, approximately 70 gene superfamilies of conotoxins have been described (ref. Robinson, et al. 2014; ref. Safavi-Hemami, et al. 2018; ref. Koch, et al. 2024).
Conotoxins can be classified into structurally diverse classes that typically share specific biological activities towards target proteins, although exceptions exist. Known structures include, e.g., the inhibitor cystine knot (ICK) (ref. Undheim, et al. 2016), Kunitz (ref. Mourao and Schwartz 2013), insulin (ref. Laugesen, et al. 2022), saposin-like (ref. Hackney, et al. 2023), and mini-granulin folds (ref. Nielsen, et al. 2019; ref. Raffaelli, et al. 2024). Structural motifs such as the ICK fold, conserved cysteine-rich domain, and Kunitz-type folds are found across a wide range of venomous animals. By contrast, despite their prevalence in caenophidian snakes (ref. Koludarov, et al. 2023), the three-finger toxins (3FTXs) have apparently not been found in other venomous animals. 3FTXs belong to the Ly6/uPAR superfamily of proteins that share the same overall fold consisting of a disulfide-bonded globular core from which three β-stranded loops (or “fingers”) protrude (ref. Kessler, et al. 2017; ref. Leth, et al. 2019; ref. Koludarov, et al. 2023).
In recent years, technological advances in the fields of transcriptomics and proteomics have greatly expanded the number of available conotoxin sequences (ref. Terrat, et al. 2012; ref. Safavi-Hemami, et al. 2014; ref. Peng, et al. 2016; ref. Degueldre, et al. 2017; ref. Gao, et al. 2018; ref. von Reumont, et al. 2022; ref. Fedosov, et al. 2023). It has thus become apparent that several previously uncharacterized toxin superfamilies encode proteins exceeding 50 amino acid residues (ref. Koch, et al. 2024). These “macro-conotoxins” have been challenging to produce and characterize due to their complicated structures, often comprising more than three disulfide bonds. However, the development of various systems for recombinant expression of disulfide-rich proteins (ref. Klint, et al. 2013; ref. Nielsen, et al. 2019; ref. Rivera-de-Torre, et al. 2021; ref. Hackney, et al. 2023), along with advances in chemical synthesis and protein ligation methods (recently reviewed in (ref. Ho, et al. 2025)), has made these proteins amenable to biochemical and pharmacological characterization. Since macro-conotoxins are less well-characterized than short conotoxins, their investigation is bound to provide new insights into conotoxin structure, function, and evolution.
Here, we use structure- and sequence-based analyses of six previously uncharacterized macro-conotoxin superfamilies to uncover their evolutionary relationships. Using sequences and structures from these superfamilies as queries, we extend our analysis beyond cone snails to identify a large group of protostome proteins that adopt a fold similar to Argos, a secretory protein from Drosophila melanogaster that binds the epidermal growth factor (EGF) ligand called Spitz and harbors three two-finger toxin (2FTX) domains. Our findings demonstrate that combining structure determination and prediction with bioinformatics analyses provides a powerful approach for uncovering evolutionary connections between seemingly unrelated sequences. Moreover, this work identifies a previously unrecognized and extensive group of protostome 2FTX proteins of unknown functions.
Results
Recombinant production of a newly identified macro-conotoxin, Tx33.1, in the fully oxidized form
To identify novel classes of macro-conotoxins, we analyzed the transcriptome of the venom gland of the mollusk-hunter Conus textile, a species with well-characterized short conotoxins (ref. Ueberheide, et al. 2009). This led to the identification of an abundant transcript (transcript per million (TPM) value: 2,188) encoding a peptide with an N-terminal signal sequence and a predicted mature toxin of 124 residues (Fig. 1A and Fig. S1A). The precursor sequence of this toxin lacks a pro-peptide region. This toxin was named Tx33.1 (ref. Robinson, et al. 2017), where “Tx” denotes the two-letter species abbreviation for C. textile, “33” indicates the cysteine framework, and “1” signifies that it is the first toxin described from this gene family. To identify orthologs of Tx33.1 in other cone snail species we employed similarity searches against assembled cone snail venom gland transcriptomes, the NCBI non-redundant protein database, and NCBI transcriptome shotgun assembly cone snail databases. Using this approach, we identified 63 sequences (of which 50 were full-length; see Supporting File 1) from 25 species representing snail-, worm- and fish-hunting Conus species. The signal sequences contain a “MLSMLAWTLMTAMVVMNA” consensus motif (Fig. S1A). Based on the first five residues of the signal sequence, this superfamily of conotoxins is designated as the MLSML superfamily (ref. Koch, et al. 2024). All identified sequences contain 12 conserved cysteine residues in the mature toxin region (Fig. 1A and Fig. S1A).

To further investigate this toxin, Tx33.1 was recombinantly expressed in BL21(DE3) E. coli cells using the csCyDisCo system (ref. Nielsen, et al. 2019; ref. Bertelsen, et al. 2021), a derivative of the original CyDisCo system (ref. Gaciarz, et al. 2017). Briefly, this systems allows for production of correctly folded, fully oxidized disulfide-rich proteins in the bacterial cytosol by co-expression of three foldases – the Erv1p oxidase, as well as human protein disulfide isomerase (hPDI) and a conotoxin-specific PDI (csPDI) found to be highly expressed in the venom gland of many cone snail species (ref. Safavi-Hemami, et al. 2016). This platform has previously been employed to produce other conotoxins in their correctly folded forms (ref. Nielsen, et al. 2019; ref. Hackney, et al. 2023; ref. Müller, et al. 2023). Tx33.1 was expressed as a C-terminal fusion to an engineered variant of ubiquitin (Ub) containing 10 consecutive histidines (Ub-His10) with a cleavage site for Tobacco Etch Virus (TEV) protease, allowing for the release of the mature toxin (ref. Rogov, et al. 2012).
SDS-PAGE analysis showed that co-expression of Ub-His10-Tx33.1 with Erv1p, hPDI, and csPDI significantly increased the yield of soluble protein, with the large majority of Ub-His10-Tx33.1 appearing in the soluble fraction (Fig. S2A). Subsequent purification steps, involving immobilized metal affinity chromatography (IMAC), TEV protease cleavage, reverse IMAC, and anion exchange chromatography, yielded highly pure protein (>95% as judged by SDS-PAGE analysis; Fig. S2B), with a final yield of approximately 8 mg/L of culture.
Containing 12 cysteine residues, Tx33.1 is predicted to form six disulfide bonds. Full oxidation was confirmed using Q-TOF mass spectrometry, which measured a monoisotopic molecular mass of 13,703.80 Da, closely matching the theoretical mass of 13,703.95 Da for the fully oxidized protein (Fig. S3A). Additional evidence that Tx33.1 adopts a folded conformation was provided by a well-dispersed [1H–15N]-heteronuclear single quantum coherence spectrum (Fig. S3B).
Tx33.1 displays a previously unknown toxin fold
We next determined the three-dimensional structure of Tx33.1 in solution using NMR spectroscopy. In total, 83% of all chemical shifts were assigned, including 88% of backbone 1H, 13C, and 15N resonances, and 80% of sidechain resonances. A total of 1,320 short-range distance restraints derived from NOESY spectra, together with 170 dihedral angle restraints calculated from chemical shifts, were used in structure calculations (Table 1).
Table 1.: Statistics for the ensemble of 20 conformers representing Tx33.1 (PDB: 9FRQ)
| Experimental restraints | |
| NOE based restraints | |
| Intraresidual (|i – j| = 0) | 247 |
| Sequential (|i – j| = 1) | 357 |
| Medium range (2 ≤ |i – j| ≤ 4) | 166 |
| Long range (|i – j| ≥ 5) | 550 |
| Total | 1320 |
| Dihedral angles | 170 |
| Disulfide bonds | 6 |
| Restraint violation statisticsa | |
| NOE-based distances (Å) | 0.030 ± 0.001 |
| Dihedral angles (°) | 0.6 ± 0.1 |
| Ramachandran plot statisticsb | |
| Most favored regions | 90.5% |
| Allowed regions | 8.1% |
| Generously allowed regions | 1.0% |
| Disallowed regions | 0.4% |
| Coordinate precisionc | |
| β-strands | |
| Backbone N, Cα and C’ | 0.35 ± 0.08 Å |
| All heavy atoms | 0.8 ± 0.2 Å |
| Residues 5-124 | |
| Backbone N, Cα and C’ | 0.8 ± 0.2 Å |
| All heavy atoms | 1.3 ± 0.3 Å |
a XPLOR-NIH,
b PROCHECK,
c MOLMOL
The Tx33.1 structure is well defined from residue 5 to residue 124, with a pairwise rootmean-square deviation (RMSD) of 0.8 ± 0.2 Å for the 20 models representing the structure (Fig. S4). The core of the Tx33.1 structure constitutes a seven-stranded β-barrel (Fig. 1B), with a pairwise RMSD of 0.35 Å. The residues prior to and immediately following β1 constitute a loop region held together by the first four disulfide bonds (Cys6–Cys30, Cys15–Cys34, Cys22–Cys42 and Cys36–Cys69; Fig. 1). A small alpha helix (α1) comprising six residues is present between β1 and β2. The last two disulfide bonds (Cys78–Cys100 and Cys102–Cys118; Fig. 1) connect loop regions in the C-terminal region of the protein. A notably long, yet well-defined loop of 10 residues is present between β2 and β3. The loop between β4 and β5 comprises six residues and constitutes a lid-like structure placed at one end of the β-barrel. Three residues, Gly23, His48, and Thr57, are strictly conserved within the MLSML family, suggesting they may play a functional role.
In summary, the NMR structure of Tx33.1 revealed a fold rich in β-strand structure that has not previously been observed in known venom proteins.
Bioinformatics uncovers six structurally and evolutionarily related macro-conotoxin superfamilies
Using the sequence and structure of Tx33.1, we next investigated the potential presence of related families of macro-conotoxins. We used structural homology searching against predicted structures of a nonredundant set of 3,412 conotoxin precursors extracted from venom gland RNAseq data from 42 cone snail species (ref. Koch, et al. 2024). Using this approach, we identified five additional uncharacterized conotoxin families – the MMLFM, “Unknown”, MARFL, unk2 and E superfamilies – with a near-identical cysteine framework (Fig. 2A) that exhibit structural similarity to Tx33.1 (Fig. 2B). Structure-based phylogenetic analysis supported this notion, demonstrating high similarity among the families (Fig. 2C). However, on a sequence-level, only the MMLFM and Unknown superfamilies show similarity to the MLSML conotoxins, as demonstrated using sequence-based CLANS (CLuster ANalysis of Sequences) clustering (ref. Frickey and Lupas 2004) (Fig. 2D) – a method for visualizing protein families based on all-against-all pairwise protein sequence similarities – and maximum likelihood tree reconstruction (Fig. 2E). Overall, the phylogenetic reconstruction based on the structural predictions (Fig. 2C) revealed a much stronger association among the six conotoxin superfamilies compared to sequence-based analyses (Fig. 2E). While the superfamilies largely formed monophyletic groups in the structural analysis (Fig. 2E), the short branch lengths of the deep branches in the tree shown in Fig. 2C suggest a common ancestor for all six superfamilies. Multiple sequence alignments of the proteins from the MLSML, MMLFM, Unknown, MARFL, unk2, and E superfamilies are provided in Fig. S1. All protein sequences used for the analysis in Fig. 2 are included in Supporting File 2.

To further investigate whether the six macro-conotoxin superfamilies share characteristic features with classical conotoxins, we examined their expression levels, signatures of positive selection, and their degree of sequence variation. Analysis of the sequences of the six identified superfamilies revealed that in addition to high transcriptional levels (Fig. S5 and Supporting File 3), five of the six superfamilies show signs of positive selection (due to the limited sample size, the unk2 superfamily was not included in this analysis). Estimated pairwise dN/dS ratios showed mean dN/dS values of 6.91, 1.93, 2.48, 2.40, and 3.31 across the MLSML, MMLFM, Unknown, MARFL, and E-superfamilies, respectively. Site-models of selection (comparing M7 and M8 models in the Phylogenetic Analysis by Maximum Likelihood (PAML) package (ref. Yang 1997)) further supported the presence of positive selection at multiple sites throughout the sequences of these toxins (72 of 179 codons in MLSML, 66 of 165 codons in MMLFM, 81 of 153 codons in Unknown, 44 of 162 codons in MARFL, and 43 of 96 codons in E). Plotting the sequence conservation along the precursors further revealed that several inter-cysteine regions exhibit a high degree of variation (Fig. S6). These sequence features, typical of conotoxin-encoding genes, support the role of these proteins as functional venom components.
In addition to the structural similarity, gene structure analysis using the published genomes of Conus ventricosus, Conus betulinus, and Conus canariensis, demonstrated that five of the six superfamilies (MLSML, MMLFM, Unknown, MARFL, and unk2) share similar gene structures, each comprising four precursor-encoding exons with conserved intron phases (0, 2, and 0, respectively, with a single exception) (Fig. S7). E-superfamily genes deviate slightly from this structure and are encoded on three exons that share intron phases (0 and 2) with the first two introns of the other superfamilies (Fig. S7). The positions of the conserved cysteines involved in disulfide-bond formation are well preserved across the exons in all superfamilies, further supporting a common evolutionary origin (Fig. S7).
Collectively, these findings reveal that the six conotoxin superfamilies share a common evolutionary origin and possess similar three-dimensional structures, despite strong divergence at non-cysteine positions in three of the superfamilies.
Comparative structural analysis of Tx33.1 and AF-predicted conotoxin structures
The results in Fig. 2B demonstrate structural similarity among proteins from the six superfamilies at an overall level based on their AF-predicted structures. Consequently, we conducted a more detailed comparison of the NMR structure of Tx33.1 with the predicted structures of the other five superfamilies.
First, we note that the NMR and AF-predicted structures of Tx33.1 are largely identical with an RMSD of 1.5 Å (Fig. 3A), demonstrating AlphaFold’s accuracy in predicting this fold. The main differences are observed in the N-terminal region, where the predicted structure includes an additional β-strand. Moreover, β1 in the NMR structure exhibits a slightly different orientation compared to the corresponding strand in the predicted structure. These subtle differences lead to the formation of the β-barrel in the NMR structure, whereas the predicted structure rather displays two domains, each containing two short β-hairpins. This same overall fold is also observed in the predicted structures of proteins from the MMLFM (Fig. 3B), Unknown, MARFL, and unk2 superfamilies (Fig. S8), although the number of predicted β-strands varies slightly. In the E-superfamily, the mature sequences consist of approximately 65 residues that align with the N-terminal half of the proteins from the five other conotoxin superfamilies (Fig. 2A and Fig. S2E). Accordingly, the AF-predicted structures of E-superfamily proteins are similar to the structure of the N-terminal half of Tx33.1 (Fig. S8D).

Overall, the NMR structure determination of Tx33.1 and the comparison with the AF- predicted structures of proteins from the MMLFM, Unknown, MARFL, unk2, and E superfamilies revealed the details of the structural similarity apparent from Fig. 2B. We found that only the Tx33.1 NMR structure displays a β-barrel structure, while small variations in the orientation of a few residues cause the predicted structures to instead display two domains, each containing two short β-hairpins. Moreover, the long protruding loop between β2 and β3 is a feature exclusively observed in the proteins of the MLSML superfamily, such as Tx33.1.
Identification of a non-venom cone snail gene encoding a secretory protein with structural similarity to the conotoxins of the six superfamilies
Toxins are often recruited into the venom gland from genes encoding endophysiological proteins. To potentially identify the cone snail gene from which conotoxins of the six investigated superfamilies were originally recruited, we first employed the NMR structure of Tx33.1 in an online structure similarity search using Foldseek (ref. van Kempen, et al. 2024). This search identified a predicted protein with a similar structure in Pomacea canaliculata, a freshwater snail for which a reference genome sequence is available (ref. Lu, et al. 2024). Using the amino acid sequence of the Tx33.1 structural homolog from P. canaliculata, we employed tblastn to extract the genomic sequences of homologous genes identified in C. betulinus and C. ventricosus.
The amino acid sequences of these cone snail proteins contain a predicted signal sequence and show the same cysteine pattern as the MMLFM conotoxins, which were consequently selected for further comparison (Fig. 4). The endogenous cone snail protein sequences exhibit substantially greater sequence conservation than the toxin sequences (Fig. 4A). The identified endogenous cone snail proteins are longer by approximately 55 residues than the MMLFM conotoxins. As expected, AF-predicted structures of MMLFM family toxins and the endogenous cone snail proteins exhibit the same overall fold (Fig. 4B), although the β-strands in the conotoxins are generally shorter. The newly identified proteins display three structural domains, each comprising two short β-hairpins (D1, D2, and D3), whereas the conotoxins lack the region encoding D3 (Fig. 4A and B).

Both groups of proteins are encoded by four exons (Fig. 4C). However, the two gene structures display different intron phases (“0, 1, 0” versus “0, 2, 0”). Closer examination of the gene structures encoding the newly identified proteins reveals that D1 is encoded by the second exon and the beginning of the third, which also encodes the entirety of D2. By contrast, the signal sequence and D3 are each encoded exclusively on separate exons (exon one and exon four, respectively). In contrast, MMLFM conotoxin gene structures show that the signal sequence is exclusively encoded by the first exon, while the mature portion of these proteins is encoded by the three remaining exons, with no clear correlation between intron-exon boundaries and the domain architecture of the protein. The observed dissimilarity in gene structure suggests that the toxin and endogenous protein either do not share a common origin or that their gene structures have diverged significantly since their last common ancestor (see Discussion).
Overall, cone snails encode a predicted secretory protein that is structurally similar in terms of the general fold to the conotoxins of the six investigated superfamilies, but with highly diverged sequences and different phases and positions of the introns.
Tx33.1 is structurally similar to Argos, a secretory protein from D. melanogaster
The discovery of an endogenous cone snail protein and conotoxins displaying similar structural folds prompted us to seek further insight into the prevalence of the observed fold and its associated function. When using the Tx33.1 structure as a query in a Foldseek search, the top hit identified in the Protein Data Bank (PDB) was the crystal structure of Argos, also known as giant lens protein, from D. melanogaster (Fig. 5A). Argos is a secretory protein that modulates epidermal growth factor receptor (EGFR) signaling by sequestering the EGF-like ligand called Spitz, thereby regulating diverse developmental processes in Drosophila (ref. Freeman, et al. 1992; ref. Klein, et al. 2004). The crystal structure of Argos consists of three separate β-sheet domains (each comprising four strands) of which Domain 2 (D2) and Domain 3 (D3) constitute a clamplike structure that binds Spitz (Fig. S9) (ref. Klein, et al. 2008). Notably, to enable crystallization, two predicted flexible regions of the Argos sequence were deleted: the first 122 residues and a region of 119 residues located within D1 (Fig. S10). As previously noted (ref. Klein, et al. 2008), the individual domains of Argos adopt the same fold as 3FTXs, such as α-bungarotoxin, with the notable distinction that each domain comprises only two “fingers” (Fig. 5B).

The structural comparison of Tx33.1 and Argos (Fig. 5A) shows a good structural correlation (RMSD = 4.2 Å) between the D1-D2 region of Argos and Tx33.1. Moreover, the positions of the last five of the six disulfide bonds in Tx33.1 are structurally conserved in Argos (Fig. 5A). The overlay of the region comprising residues 1-71 of Tx33.1 with α-bungarotoxin (Fig. 5C) shows an overall structural similarity with an RMSD = 4.2 Å in the region of the snake toxin comprising the first two fingers. Notably, the position of the extended loop in Finger II of α-bungarotoxin, which constitutes an important binding site for several types of nicotinic acetylcholine receptors (nAChRs) (ref. Zouridakis, et al. 2014; ref. Kessler, et al. 2017), corresponds to the position of the long loop between β2 and β3 in Tx33.1. The regions comprising the N-terminal half of the AF-predicted structures of the MMLFM, Unknown, MARFL, unk2 and E-superfamily toxins also exhibit structural similarity to 3FTXs (not shown). Due to their overall structural similarity to 3FTXs, but with the notable difference of a “missing” finger, we propose the collective name “two-finger toxins” (2FTXs) for the conotoxins of the six superfamilies investigated here.
Identification of a large group of protostome 2FTX proteins
In addition to detecting Argos, the Foldseek search with Tx33.1 also detected a wide variety of 2FTX structures predicted by AlphaFold. This included venom proteins from centipedes containing three 2FTX domains, as well as proteins from turrids (venomous marine snails that, like cone snails, belong to the superfamily Conoidea) that contain two 2FTX domains (see, e.g., UniProt entries P0DQA2 and A0A098LWA8). Using such proteins as query sequences in PSI-BLAST searching uncovered from a single search approximately 1,800 hits among protostomes, e.g., from mollusks, nematodes, tardigrades, and arthropods (including ticks, mites, spiders, ants, crustaceans, wasps, bees, beetles, butterflies, and scorpions). Manual inspection of these sequences showed that a clear majority contain a predicted signal sequence, indicating that they are secretory proteins like Argos. To the best of our knowledge, none of these many 2FTX proteins have previously been described in the literature and at present their function remains unknown. A single exception is the identification of a male-specific protein from the orb-weaving spider Tetragnatha versicolor that exhibits weak sequence similarity to Argos (ref. Zobel-Thropp, et al. 2018).
A CLANS clustering analysis of the sequences obtained from PSI-BLAST searches revealed that they segregate into 12 distinct groups based on sequence similarity (Cluster A-L; Fig. S11A). The AF-predicted structures of a representative sequence from each cluster (Fig. S11B), show that two of the clusters (A and D) encode two-domain 2FTX proteins, while the remaining clusters all encode three-domain 2FTX proteins (see Supporting File 4 for input sequences used in this panel). Not surprisingly, the two clusters that diverge most from the others (K and L) also display structural features rarely observed in the other clusters, such as a greater proportion of loop and α-helix regions.
Argos from D. melanogaster is found in Cluster C, which includes proteins similar to Argos with predicted flexible regions in addition to the three 2FTX domains (Fig. S10), as well as proteins comprising only three 2FTX domains (Cluster C; Fig. S11B). In fact, across clusters the most common overall structure consists of three 2FTX domains without intervening predicted flexible regions (Fig. S11B). Sequence comparisons among a selection of such proteins from a small, but diverse set of species revealed conservation of the cysteine residues in a pattern matching that of the MMLFM conotoxins (Fig. S12). Pairwise sequence identities between Argos and the various proteins in the alignment range from 12% to 26% (25%-38% sequence similarity) in the overlapping region. Despite this relatively low sequence identity across different phyla, certain AF-predicted structures of representative proteins closely resembled the structure of Argos (Fig. 6A), exhibiting the same three-domain architecture of the 2FTX fold (Fig. 6B).

Two proteins exhibiting Argos-like structures were identified in model organisms. The C. elegans M6.4 protein is of unknown function. According to its Wormbase entry (WBGene00019776), there is enriched tissue expression in the anchor cell and neurons, highest developmental expression in the L1 larval stage, and no phenotype observed for gene knockouts. In A. californica we obtained an Argos-like sequence from a Foldseek hit and using this sequence we searched tissue-specific transcriptome datasets from A. californica (ref. Koch, et al. 2023) to generate a tissue expression profile (Fig. S13). The results showed the highest expression in pedal ganglia with generally much lower expression in all other tissues.
Overall, these findings reveal a large, previously unrecognized group of protostome-specific 2FTX proteins, several of which show close structural similarity to Argos.
Investigations of Tx33.1 bioactivity
Finally, we tested bioactivity of Tx33.1 in three experimental systems. First, based on the structural similarity with 3FTXs such as α-bungarotoxin, we tested the activity of Tx33.1 on human α7 and α9α10 nAChRs heterologously expressed in Xenopus oocytes using two-electrode voltage-clamp (TEVC) electrophysiology. Tx33.1 showed minimal inhibitory activity at the concentration (30 μM) tested; the ACh-evoked currents amplitudes mediated by α7 nAChRs were 99 +/− 8% (n=4) of control values and 89 +/− 11% (n=4) for α9α10 nAChRs (data not shown).
In a next set of experiments, we assessed the bioactivity of Tx33.1 using the constellation pharmacology assay, which enables monitoring of Ca2+ influx in primary cell cultures from mouse dorsal root ganglia (DRG) (ref. Teichert et al, 2012). During each experiment, intracellular calcium levels were simultaneously monitored in ~1,000 cells using the Fura-2-AM dye. Dissociated DRG neurons were depolarized using 15 s pulses of high-potassium extracellular solution (30 mM), typically resulting in Ca2+ elevation, as indicated by an increase in the Fura-2 signal. For activity screening, Tx33.1 was applied in the extracellullar solution (at 1 μM and 10 μM) between KCl pulses, and activity was assessed based on significant changes (block or amplification) in the response to the subsequent KCl pulse. No effects were observed at either concentration tested (data not shown).
Lastly, given the structural resemblance between Tx33.1 and the Argos-like protein identified in A. californica, we hypothesized that Tx33.1 might exert a functional effect upon injection into this mollusk. However, we observed no phenotypic effect when assaying gross locomotion in a 30-minute period following injection (data not shown).
Discussion
Here, we analyzed six previously uncharacterized superfamilies of large conotoxins. Using NMR structure determination and AlphaFold prediction, we demonstrate that these conotoxins exhibit a unique structure distinct from other known toxins. The 2FTX fold identified here is also present in a vast number of protostome proteins, suggesting that the Argos-mediated regulatory mechanism for EGFR signaling may be more widespread in nature than previously recognized.
Cone snails belong to the large superfamily Conoidea, which primarily consists of venomous marine snails, including turrids and terebrids. Based on their prey preference for either fish, worms or other mollusks, cone snails are classified as piscivores, vermivores or molluscivores, respectively. The conotoxins of the six superfamilies investigated here are widely distributed among cone snail species and are expressed in cone snails that prey on fish, worms, and mollusks alike. The presence of several putative toxins most closely related to the MMLFM superfamily in the turrid Gemmula speciosa (ref. Watkins, et al. 2006; ref. Gonzales and Saloma 2014) indicates that these toxins were present in the common ancestor of extant cone snails, suggesting their origin within the superfamily Conoidea.
The 2FTX-based structures identified here are reminiscent of the 3FTX fold as previously noted (ref. Klein, et al. 2008). 3FTXs belong to the Ly6/uPAR superfamily of proteins (ref. Kessler, et al. 2017; ref. Leth, et al. 2019; ref. Koludarov, et al. 2023). Several mammalian members of the Ly6/uPAR superfamily are transmembrane proteins belonging to the transforming growth factor (TGF)-β receptor family (ref. Leth, et al. 2019), including the urokinase plasminogen activator receptor and activin type II receptors. Like Argos, these proteins bind growth factor-like domains in their binding partners to exert a variety of cellular functions. Given the predicted structural resemblance to Argos (as illustrated in Fig. 6), we speculate that at least certain protostome Argos-like proteins may modulate EGFR signaling across diverse species. However, because these proteins remain undescribed in the literature, further studies are necessary to verify this hypothesis. The protostome clade encompasses a wide range of organisms across phyla such as arthropods, nematodes, mollusks, flatworms, and annelids, all of which have characterized EGFR signaling pathways (ref. Barberan, et al. 2016). Although it was originally speculated that a human homologue of Argos might exist (ref. Klein, et al. 2008), our PSI-BLAST and FoldSeek searches did not identify any Argos-like proteins in deuterostomes.
Given that Argos-binding of Spitz relies on Domain 3, which is absent from the conotoxins of all six superfamilies investigated here, any conotoxin function involving EGF-type ligand binding would require a markedly different binding mode. Therefore, we consider growth factor binding unlikely for the conotoxins in this study. However, we note the existence of a family of EGF-like conotoxins (unpublished observation) and the identification of the kappa-scoloptoxin Ssd1b from the venom gland transcriptome of the centipede, Scolopendra dehaani, which encodes an Argos-like protein containing three 2FTX domains (ref. Liu, et al. 2012). Therefore, the predation strategy of at least certain cone snails and centipedes may involve disruption of EGFR signaling in their prey.
Based on the structural similarities with 3FTXs and Argos we tested the bioactivity of Tx33.1 on human α7 and human α9α10 nAChR, on mouse dorsal root ganglion cells, and by injection into A. californica. Given the slim chances of target availability for any particular toxin across these diverse systems, the negative outcome of these experiments is not surprising. Consequently, it remains difficult to assess the biological activity of this group of conotoxins at present. Although we did not observe activity on human α7 or α9α10 nAChR, this does not exclude the possibility that these toxins target the cholinergic system, similar to 3FTXs. In any case, for Tx33.1 and other members of the MLSML family, it is tempting to speculate that the distinctive long loop in Finger II plays a functional role, given its close resemblance to the corresponding loop in several long α-neurotoxins that target nAChRs.
Often, weaponization of endophysiological signaling peptides is accompanied by structural adaptations. For instance, 3FTXs lost the transmembrane domain present in their immediate ancestor (ref. Koludarov, et al. 2023), whereas toxins derived from neuropeptides of the CHH family lack an α-helix present in the ancestral genes (ref. Undheim, et al. 2015). Similarly, if the 2FTXs indeed evolved from an endogenous Argos-like protein, these conotoxins may have lost the C-terminal region of their sequences (corresponding to D3 of the cone snail Argos-like protein) during venom recruitment, while simultaneously diverging in sequence. However, due to the differences in intron phases and locations, as well as highly divergent sequences, an evolutionary relationship between the conotoxin families and the endogenous Argos-like protein is not readily apparent.
In conclusion, this study of six uncharacterized families of macro-conotoxins reveals a previously undescribed toxin structure and provides new insight into conotoxin evolution. Our findings also suggest that 2FTX proteins may constitute a large group of overlooked modulators of EGFR signaling in protostomes. Verification of this hypothesis will require further investigation through cellular and organismal studies, such as the examination of EGFR signaling perturbations in null variants of the putative Argos ortholog gene. Methodologically, we propose that combining modern deep learning-based tools for structure prediction, comparison and similarity searching, as applied here, provides a powerful approach for investigating the evolution of rapidly evolving animal venom peptides.
Materials and methods
Sequence mining and analysis
We searched published (ref. Koch, et al. 2024) cone snail venom gland transcriptomes using blastp with standard settings. We also used the online blastp and tblastn algorithms to retrieve homologous sequences of Tx33.1 using standard settings. To discern their superfamily assignments, we clustered these using all-against-all blastp p-values in CLANS (ref. Frickey and Lupas 2004). To determine the toxin gene structures, we predicted the coding regions of MLSML, MMLFM, Unknown, MARFL, unk2, and E-superfamily toxins in C. betulinus, C. ventricosus, and C. canariensis using the protein2genome (p2g) model in Exonerate (v.2.76.2) (ref. Slater and Birney 2005). The annotated sets of known toxins served as queries, and the longest predicted model for each gene was seleceted for further analyses, including identification of intron location and phases.
The amino acid sequence from P. canaliculata encoding a protein structurally similar to Tx33.1 was used as a query in a blastp search against the non-redundant protein sequence database within Conus to identify orthologues. This search identified a precursor transcriptome sequence from C. ammiralis. To identify genomic scaffolds encoding homologous 2FTX proteins, the C. ammiralis precursor sequence was then queried against the published genomes of C. betulinus and C. ventricosus using p2g in Exonerate.
Signal sequences were predicted using SignalP v. 6.0 (ref. Teufel, et al. 2022). Based on the absence of a basic site for proteolytic cleavage before the first cysteine in the sequence, Tx33.1 does not appear to encode a propeptide. The mature sequence is predicted to begin immediately after the signal peptidase cleavage site at Ala18. The predicted mature sequence of Tx33.1 is: KSHTTCPTSTEIDSCSNDNNACGKDVSGSCSSLCNCGNGQTCFTDSNHTITLVPYYTEDGPFEKKYYTCGDPSELDECYDIDKALEVNESDDPNSVEVLCHCPSDKIYLWIHRGYYICITPPQP. Sequences were aligned using the MAFFT server (v. 7) (ref. Katoh, et al. 2002) and visualized using the program Jalview (v. 2.11) (ref. Waterhouse, et al. 2009). To remove sequence redundancy, CD-HIT (ref. Fu, et al. 2012) was used with a 90% identity cut-off.
For the structural similarity analysis in Fig. 2, each of the toxin precursor sequences was predicted using AlphaFold2 (ref. Jumper, et al. 2021) prior to calculating all-against-all structural similarities using Foldseek easy-cluster (ref. Barrio-Hernandez, et al. 2023). The obtained Foldseek e-values were then used as attractive values in the CLANS program. Finally, we reconstructed the phylogeny of the conotoxin superfamilies based on their structural predictions using FoldTree (ref. Moi, et al. 2023), which utilizes Foldseek’s structural alphabet for tree reconstruction.
To determine the tissue distribution of the transcript encoding the Argos-like protein from A. californica (NCBI reference XP_005110430.1), we first identified the nucleotide sequence encoding the protein by performing an NCBI tBLASTn search against the A. californica transcriptome shotgun assembly database. The identified nucleotide sequence (NCBI reference GBDA01016051.1) was subsequently used as a query sequence in a BLASTn search against various neuronal tissue transcriptomes of A. californica, generated as previously described (ref. Koch, et al. 2023). Transcriptomes from two non-neuronal tissues – salivary gland and spermatheca – served as negative controls.
Selection analyses
The analyses were performed identically for the MLSML, MMLFM, Unknown, MARFL, and E-superfamily genes. For each of the cone snail species expressing these toxin genes, one random gene was selected to avoid comparisons between paralogous genes within the same species. The nucleotide sequences were translated with transeq v6.6.0.0, aligned with mafft v7.525 using the linsi algorithm. We converted the protein sequence alignment to the corresponding nucleotide alignment using pal2nal v14 (ref. Suyama, et al. 2006). The gene tree was estimated using iqtree v2.2.2.3, with the nucleotide models of evolution selected based on the Bayesian Information Criterion as follows: JTTDCMut+G4, WAG+G4, WAG+G4, WAG+G4 for MLSML, MMLFM, Unknown, MARFL and E-superfamily, respectively. Both pairwise dN/dS estimations and site-models were run using paml v4.10.7 with codon frequency FmutSel assigning fitness to every codon.
Plasmid generation
A codon-optimized sequence for expression in E. coli encoding mature Tx33.1 was synthesized and inserted into the pET39_Ub19 expression vector (ref. Rogov, et al. 2012). The resulting plasmid (named pLE666; purchased from TWIST Bioscience®) encodes ubiquitin (Ub)-His10-tagged Tx33.1 with a cleavage site for TEV protease located immediately before the mature toxin sequence. The fusion protein produced from pLE666 was named Ub-His10-Tx33.1.
Protein expression
Chemically competent E. coli BL21(DE3) cells were transformed with pLE666 alone or co-transformed with pcsCyDisCo (pLE577) (ref. Nielsen, et al. 2019) encoding Erv1p, hPDI and csPDI to facilitate oxidative folding. The transformed cells were plated on lysogeny broth (LB) agar supplemented with kanamycin (100 μg/ml) (for transformation with pLE666 alone) or kanamycin and chloramphenicol (30 μg/ml) for co-transformations with pLE577. A single colony was transferred into 10 ml LB medium supplemented with the appropriate antibiotic(s) and incubated at 37 °C in an orbital shaker at 200 rpm for ~16 hours. For small-scale expressions, the pre-culture was inoculated into 50 ml LB medium supplemented with appropriate antibiotic(s) to a starting OD600 value of 0.1 and grown at 37°C until the OD600 reached ~0.6. Protein expression was induced by adding isopropyl ß-D-1-thiogalactopyranoside (IPTG) to a final concentration of 1 mM and the culture further propagated at 25°C with shaking (200 rpm) for ~20 hours. For large-scale expression performed in either 0.5 L (M9 minimal medium) or 1 L (LB medium) culture volume, the overnight culture was diluted in LB or M9 minimal medium (for stable isotope labeling; composition: 3 g/liter KH2PO4 , 15.1 g/liter Na2HPO4 12H2O, 5 g/liter NaCl, 1 mM MgSO4, 1 ml/liter M2 Trace element solution (203 g/liter MgCl2 6H2O, 2.1 g/liter CaCl2 2H2O, 2.7 g/liter FeSO4 7H2O, 20 mg/liter AlCl3 6H2O, 10 mg/liter CoSO4 7H2O, 2 mg/liter KCr(SO4)2 12H2O, 2 mg/liter CuCl2 2H2O, 1 mg/liter H3BO4 , 20 mg/liter KI, 20 mg/liter MnSO4 H2O, 1 mg/liter NiSO4 6H2O, 4 mg/liter Na2MoO4 2H2O, 4 mg/liter ZnSO4 7H2O, 21 g/liter citric acid monohydrate), 1 g/liter [15N]NH4Cl, and 4 g/liter [13C]glucose) to an OD600 = 0.1. For cells grown in LB medium, expression was induced as described above. For isotope labeling, cells were first grown in M9 minimal medium containing unlabelled NH4Cl and glucose. Upon reaching OD600 ~ 0.6, cells were pelleted at 3,000 g for 20 minutes and resuspended in 0.5 L M9 medium containing 15NH4Cl and [13C]-glucose. The culture was incubated at 37°C for 30 minutes in an orbital shaker at 200 rpm before induction of protein expression with 1 mM IPTG and overnight incubation at 25°C.
Cell harvest and preparation of lysate
Bacterial cultures were pelleted by centrifugation at 4,000 g for 15 minutes at 4°C. The cells were resuspended in 20 ml lysis buffer (300 mM NaCl, 50 mM NaH2PO4, 20 mM imidazole, pH 8.0) per liter of culture, incubated on a roller for 45 minutes at 4°C and frozen at −20°C overnight. After a freeze-thaw cycle, the lysate was sonicated for 8 cycles of 30 seconds with 30-second intervals in between using a UP200S ultrasonic processor (Hielscher) at 90% power while keeping the sample on ice. The lysate was centrifuged at 30,000 g for 45 minutes at 4°C before filtration through a 0.45 μm syringe filter. Cell pellets were resuspended in the same relative volume of lysis buffer containing 8 M urea to be used for SDS-PAGE analysis.
Protein purification
Ub-His10-Tx33.1 was purified from the cleared cell lysate on an ÄKTA Start chromatography system using IMAC on a 5 ml pre-packed HisTrap column (Cytiva®). Upon application of the lysate, the column was washed with 20 column volumes of lysis buffer before elution of Ub-His10-Tx33.1 with a 0-100% gradient with IMAC elution buffer (300 mM NaCl, 50 mM NaH2PO4, 400 mM imidazole, pH 8.0). After dialysis into storage buffer (300 mM NaCl, 50 mM NaPi, pH 8.0), the Ub-His10-Tx33.1 fusion protein was incubated overnight at room temperature at a ratio of 1:8 with activated His6-TEV protease, produced and purified in-house as described previously (ref. Nielsen, et al. 2019). The cleavage mixture was centrifuged to remove any precipitates and applied to a 5 ml gravity flow column packed with Ni-NTA material (QIAGEN®). The flow-through containing Tx33.1 was collected, whereas the column-bound Ub-His10 tag and His6-TEV protease were eluted with IMAC elution buffer. Tx33.1 was then dialyzed into anion exchange wash buffer (20 mM NaCl, 50 mM NaPi, pH = 5.3). The sample was loaded onto an 8 ml Source 15Q (Cytiva®) column attached to an ÄKTA Pure System, followed by washing for 3 column volumes with the same buffer. Finally, the protein was eluted using a 30-60 % gradient of anion exchange elution buffer (1 M NaCl, 50 mM NaPi, pH 5.3). The fractions containing pure Tx33.1 as judged by SDS-PAGE analysis were pooled and dialyzed into storage buffer.
SDS-PAGE analysis
Protein samples were analyzed under non-reducing conditions on 15% glycine SDS-PAGE gels alongside the PageRuler pre-stained protein ladder (Thermofischer Scientific®). Gels were stained with Coomassie Brilliant Blue, and images were acquired on a ChemiDoc™ MP Imaging System (BioRad®).
Determination of protein concentration
To determine the concentration of the purified Tx33.1, the absorbance was measured at 280 nm on a NanoDrop spectrophotometer (ND-1000, Thermofischer Scientific®). The theoretical extinction coefficient (18,170 M−1 cm−1) was calculated using the web-based Expasy ProtParam tool (ref. Wilkins, et al. 1999).
Mass spectrometry
Mass spectrometry was performed on a Waters Xevo G3 QTof system equipped with a ACQUITY UPLC C4 reversed-phase liquid chromatography column (300 Å, 1.7 μm, 2.1 mm × 50 mm) using a sample of purified Tx33.1 at 5 μg/mL adjusted to pH =2 with trifluoroacetic acid as input.
NMR spectroscopy
A sample of 500 μM 13C,15N-labelled Tx33.1 were prepared in 10 mM 2-(N-morpholino)ethanesulfonic acid (MES), 50 mM NaCl, 10% D2O, pH 6.5. For backbone chemical shift assignments 15N-HSQC, HNCA, HN(CO)CA, CBCA(CO)NH, CBCANH, CBCA(CO)NH, HNCO, and HN(CA)CO spectra were recorded on Bruker Avance III HD spectrometers operating at 600 and 750 MHz spectrometer with triple resonance cryoprobes. Side chain assignments were achieved from HCCONH and HCCH-TOCSY. Distance restraints were obtained from 15N-NOESY-HSQC and 13C-NOESY-HSQC experiments recorded using a mixing time of 120 ms. The triple resonance spectra were recorded with non-uniform sampling at 25% and reconstructed with qMDD (ref. Kazimierczuk and Orekhov 2011). All spectra were processed with nmrPipe (ref. Delaglio, et al. 1995) and analyzed in CCPNMR analysis (ref. Skinner, et al. 2016).
NMR structure calculations
Automated NOE assignment was performed using Cyana (ref. Güntert and Buchner 2015). The NOE list was manually refined. XPLOR-NIH (ref. Schwieters, et al. 2003) was used for the final structural refinement, including a torsion angle database potential (ref. Bermejo, et al. 2012) and an implicit solvent model (ref. Tian, et al. 2017). For the final ensemble representing the solution structure of Tx33.1, the 20 structures with the lowest energy were chosen from 100 calculated structures. The Ramachandran-plot statistics was calculated using PROCHECK (ref. Laskowski, et al. 1996) and the coordinate precisions were calculated by MOLMOL (ref. Koradi, et al. 1996). Structure visualization was performed with PyMOL version 2.5.4 (DeLano Scientific). The structure has been deposited in the Protein Data Bank (PDB, wwpdb.org) with PDB code 9FRQ and chemical shifts and peaks lists from the NOESY spectra have been deposited in the Biological Magnetic Resonance Bank (http://www.bmrb.wisc.edu/) with ID 34921.
Structure prediction and homology searching
Structural predictions using AlphaFold2 or AlphaFold3 were performed with the Google Colab or AlphaFold3 servers (ref. Mirdita, et al. 2022; ref. Abramson, et al. 2024), while structure similarity searches were done using Foldseek (ref. van Kempen, et al. 2024). All AF-models used have pLDDT scores in the range between 0.65 and 0.9, with most above 0.8. Structure superpositions were performed using the “super” command in PyMOL and the listed RMSD values were provided by the same program.
Electrophysiology
Detailed methods for conducting two-electrode voltage-clamp (TEVC) experiments of human nAChRs heterologously expressed in X. laevis oocytes have been previously described (ref. Hone, et al. 2024). Briefly, the oocytes were continuously perfused by gravity at a rate of ~3 ml/min with frog saline composed of 115 mM NaCl, 2.5 mM KCl, 1.8 mM CaCl2, 1.0 mM MgCl2, 1 mg/ml bovine serum albumin and buffered to pH 7.4 with 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES). The perfusion system consisted of a series of solenoid valves (NResearch Inc, Caldwell, NJ, USA) controlled by LabVIEW software (National Instruments, Austin TX, USA) operating on a personal computer. To assess the activity of Tx33.1, the oocyte membranes were clamped at a holding potential of −70 mV using a Warner Instruments Oocyte Clamp OC-725C (Warner Instruments, Hamden, CT, USA) and stimulated with 100 μM acetylcholine (ACh) until stable baseline responses were observed. Oocytes expressing the α7 subtype were pulsed with ACh for 500 ms, to minimize open-channel block by ACh, and 1000 ms for those expressing the α9α10 subtype. Once stable responses were observed, Tx33.1, suspended in water, was manually applied to the oocyte in a static bath for 5 min. The ACh-evoked current amplitudes in the presence of Tx33.1 were compared to currents in response to control applications of water. The data were analyzed using Prism 10.2.4 (GraphPad, La Jolla, CA, USA) and the responses expressed as percent response ± SD. The ACh-evoked currents were acquired and sampled at 50 Hz using a USB-6009 digital acquisition system (National Instruments) and filtered at 5 Hz (FIB1; Frequency Devices, Ottawa, IL, USA).
Calcium imaging
Experiments were performed as previously described (ref. Teichert, et al. 2012). Briefly, lumbar DRG neurons from C57BL/6 mice >30 days old were dissociated by trypsin digestion and mechanical trituration and plated on polylysine-coated plates. The cells were incubated overnight in a 5% CO2 incubator at 37 °C in neuronal culture medium [minimal essential media supplemented with 10% (vol/vol) FBS, penicillin (100 U/mL), streptomycin (100 μg/mL), 1× Glutamax, 10 mM HEPES, and 0.4% (wt/vol) glucose, adjusted to pH 7.4]. One hour before the experiment, cells were loaded with 2.5 μM fura-2-acetoxymethyl ester (Fura-2-AM, Sigma-Aldrich) and incubated at 37 °C. Cells loaded with Fura-2-AM were excited intermittently with 340- and 380-nm light; fluorescence emission was monitored at 510 nm. An image was captured at each excitation wavelength, and the ratio of fluorescence intensities (340/380 nm) was acquired every 2 s to monitor the relative changes in intracellular calcium concentration in each cell as a function of time. Typically, ~1,000 neurons were imaged during each experiment. The calcium transients were elicited by 15 s applications of depolarizing stimulus (30 mM KCl), followed by four washes with extracellular solution [145 mM NaCl, 5 mM KCl, 2 mM CaCl2, 1 mM MgCl2, 1 mM sodium citrate, 10 mM HEPES, and 10 mM glucose, adjusted to pH 7.4]. For activity screening, Tx33.1 was applied in extracellular solution at 1 and 10 μM between KCl pulses. Four different pharmacological agents were used for cell classification: conotoxin κM-RIIIJ (1 μM), AITC (100 μM), menthol (400 μM) and capsaicin (300 nM). The data was acquired using NIS-Elements and further processed with CellProfiler v.3.1.9 (ref. Jones, et al. 2009). Custom-built scripts in Python v.3.7.2 and R were used for further data analysis and visualization. The experimental methods were approved by the Institutional Animal Care and Use Committee (IACUC) of the University of Utah (Protocol number: 17–05017).
Aplysia injections
Juvenile (5-20g) Aplysia californica purchased from the National Resource for Aplysia, University of Miami, were injected with 10 ng toxin as described previously (ref. Espino, et al. 2024). The center-of-mass was tracked using ZebraZoom (https://github.com/oliviermirat/ZebraZoom; (ref. Mirat, et al. 2013)) and the overall movement was quantified for a 30-minute period.
Supplementary Materials
- Supporting File 1: All 50 full-length MLSML superfamily sequences. (PDF) (PDF)
- Supporting File 2: All conotoxin amino acid sequences used for the analyses in Fig. 2. (PDF) (PDF)
- Supporting File 3: Source data related to the boxplot shown in Fig. S5. (XLSX) (XLSX)
- Supporting File 4: Representative sequences from clusters A-L in Fig. S11A used as input for the AlphaFold predictions in Fig. S11B. (PDF) (PDF)
References
- J Abramson, J Adler, J Dunger, R Evans, T Green, A Pritzel, O Ronneberger, L Willmore, AJ Ballard, J Bambrick. Accurate structure prediction of biomolecular interactions with AlphaFold 3.. Nature, 2024. [PubMed]
- S Barberan, JM Martin-Duran, F Cebria. Evolution of the EGFR pathway in Metazoa and its diversification in the planarian Schmidtea mediterranea.. Sci. Rep., 2016. [PubMed]
- I Barrio-Hernandez, J Yeo, J Janes, M Mirdita, CLM Gilchrist, T Wein, M Varadi, S Velankar, P Beltrao, M Steinegger. Clustering predicted structures at the scale of the known protein universe.. Nature, 2023. [PubMed]
- GA Bermejo, GM Clore, CD Schwieters. Smooth statistical torsion angle potential derived from a large conformational database via adaptive kernel density estimation improves the quality of NMR protein structures.. Protein Sci., 2012. [PubMed]
- AB Bertelsen, CM Hackney, CN Bayer, LD Kjelgaard, M Rennig, B Christensen, ES Sørensen, H Safavi-Hemami, T Wulff, L Ellgaard. DisCoTune: versatile auxiliary plasmids for the production of disulphide-containing proteins and peptides in the E. coli T7 system.. Microb. Biotechnol., 2021
- KCF Bordon, CT Cologna, EC Fornari-Baldo, EL Pinheiro-Junior, FA Cerni, FG Amorim, FAP Anjolette, FA Cordeiro, GA Wiezel, IA Cardoso. From Animal Poisons and Venoms to Medicines: Achievements, Challenges and Perspectives in Drug Discovery.. Front. Pharmacol., 2020. [PubMed]
- M Degueldre, M Verdenaud, G Legarda, R Minambres, S Zuniga, M Leblanc, N Gilles, F Ducancel, E De Pauw, L Quinton. Diversity in sequences, post-translational modifications and expected pharmacological activities of toxins from four Conus species revealed by the combination of cutting-edge proteomics, transcriptomics and bioinformatics.. Toxicon, 2017. [PubMed]
- F Delaglio, S Grzesiek, GW Vuister, G Zhu, J Pfeifer, A Bax. NMRPipe: a multidimensional spectral processing system based on UNIX pipes.. J. Biomol. NMR, 1995. [PubMed]
- TF Duda, SR Palumbi. Molecular genetics of ecological diversification: duplication and rapid evolution of toxin genes of the venomous gastropod Conus.. Proc. Natl. Acad. Sci. USA, 1999. [PubMed]
- S Espino, M Watkins, R Probst, TL Koch, K Chase, J Imperial, SD Robinson, P Florez Salcedo, D Taylor, J Gajewiak. chi-Conotoxins are an Evolutionary Innovation of Mollusk-Hunting Cone Snails as a Counter-Adaptation to Prey Defense.. Mol. Biol. Evol., 2024
- A Fedosov, CF Tucci, Y Kantor, S Farhat, N Puillandre. Collaborative Expression: Transcriptomics of Conus virgo Suggests Contribution of Multiple Secretory Glands to Venom Production.. J. Mol. Evol., 2023. [PubMed]
- M Freeman, C Klambt, CS Goodman, GM Rubin. The argos gene encodes a diffusible factor that regulates cell fate decisions in the Drosophila eye.. Cell, 1992. [PubMed]
- T Frickey, A Lupas. CLANS: a Java application for visualizing protein families based on pairwise similarity.. Bioinformatics, 2004. [PubMed]
- L Fu, B Niu, Z Zhu, S Wu, W Li. CD-HIT: accelerated for clustering the next-generation sequencing data.. Bioinformatics, 2012. [PubMed]
- A Gaciarz, NK Khatri, ML Velez-Suberbie, MJ Saaranen, Y Uchida, E Keshavarz-Moore, LW Ruddock. Efficient soluble expression of disulfide bonded proteins in the cytoplasm of Escherichia coli in fed-batch fermentations on chemically defined minimal media.. Microb. Cell Fact., 2017. [PubMed]
- B Gao, C Peng, Y Zhu, Y Sun, T Zhao, Y Huang, Q Shi. High Throughput Identification of Novel Conotoxins from the Vermivorous Oak Cone Snail (Conus quercinus) by Transcriptome Sequencing.. Int. J. Mol. Sci., 2018
- DT Gonzales, CP Saloma. A bioinformatics survey for conotoxin-like sequences in three turrid snail venom duct transcriptomes.. Toxicon, 2014. [PubMed]
- P Güntert, L Buchner. Combined automated NOE assignment and structure calculation with CYANA.. J. Biomol. NMR, 2015. [PubMed]
- CM Hackney, P Florez Salcedo, E Müller, TL Koch, LD Kjelgaard, M Watkins, LG Zachariassen, PS Tuelung, JR McArthur, DJ Adams. A previously unrecognized superfamily of macro-conotoxins includes an inhibitor of the sensory neuron calcium channel Cav2.3.. PLoS Biol., 2023. [PubMed]
- TN Ho, TH Tran, HS Le, RJ Lewis. Advances in the synthesis and engineering of conotoxins.. Eur. J. Med. Chem., 2025. [PubMed]
- AJ Hone, U Santiago, PJ Harvey, B Tekarli, J Gajewiak, DJ Craik, CJ Camacho, JM McIntosh. Design, Synthesis, and Structure-Activity Relationships of Novel Peptide Derivatives of the Severe Acute Respiratory Syndrome-Coronavirus-2 Spike-Protein that Potently Inhibit Nicotinic Acetylcholine Receptors.. J. Med. Chem., 2024. [PubMed]
- K Illergard, DH Ardell, A Elofsson. Structure is three to ten times more conserved than sequence – a study of structural response in protein cores.. Proteins, 2009. [PubMed]
- TNW Jackson, I Koludarov. How the Toxin got its Toxicity.. Front. Pharmacol., 2020. [PubMed]
- TR Jones, AE Carpenter, MR Lamprecht, J Moffat, SJ Silver, JK Grenier, AB Castoreno, US Eggert, DE Root, P Golland. Scoring diverse cellular morphologies in image-based screens with iterative feedback and machine learning.. Proc. Natl. Acad. Sci. USA, 2009. [PubMed]
- J Jumper, R Evans, A Pritzel, T Green, M Figurnov, O Ronneberger, K Tunyasuvunakool, R Bates, A Zidek, A Potapenko. Highly accurate protein structure prediction with AlphaFold.. Nature, 2021. [PubMed]
- K Katoh, K Misawa, K Kuma, T Miyata. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.. Nucleic Acids Res., 2002. [PubMed]
- K Kazimierczuk, VY Orekhov. Accelerated NMR spectroscopy by using compressed sensing.. Angew. Chem. Int. Ed. Engl., 2011. [PubMed]
- P Kessler, P Marchot, M Silva, D Servent. The three-finger toxin fold: a multifunctional structural scaffold able to modulate cholinergic functions.. J. Neurochem., 2017. [PubMed]
- DE Klein, VM Nappi, GT Reeves, SY Shvartsman, MA Lemmon. Argos inhibits epidermal growth factor receptor signalling by ligand sequestration.. Nature, 2004. [PubMed]
- DE Klein, SE Stayrook, F Shi, K Narayan, MA Lemmon. Structural basis for EGFR ligand sequestration by Argos.. Nature, 2008. [PubMed]
- JK Klint, S Senff, NJ Saez, R Seshadri, HY Lau, NS Bende, EA Undheim, LD Rash, M Mobli, GF King. Production of recombinant disulfide-rich venom peptides for structural and functional analysis via expression in the periplasm of E. coli.. PLoS One, 2013. [PubMed]
- TL Koch, SD Robinson, PF Salcedo, K Chase, J Biggs, AE Fedosov, M Yandell, BM Olivera, H Safavi-Hemami. Prey Shifts Drive Venom Evolution in Cone Snails.. Mol. Biol. Evol., 2024
- TL Koch, JP Torres, RP Baskin, PF Salcedo, K Chase, BM Olivera, H Safavi-Hemami. A toxin-based approach to neuropeptide and peptide hormone discovery.. Front. Mol. Neurosci., 2023. [PubMed]
- I Koludarov, T Senoner, TNW Jackson, D Dashevsky, M Heinzinger, SD Aird, B Rost. Domain loss enabled evolution of novel functions in the snake three-finger toxin gene superfamily.. Nat. Commun., 2023. [PubMed]
- R Koradi, M Billeter, K Wüthrich. MOLMOL: A program for display and analysis of macromolecular structures.. J. Mol. Graph., 1996. [PubMed]
- RA Laskowski, JA Rullmannn, MW MacArthur, R Kaptein, JM Thornton. AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR.. J. Biomol. NMR, 1996. [PubMed]
- SH Laugesen, DH Chou, H Safavi-Hemami. Unconventional insulins from predators and pathogens.. Nat. Chem. Biol., 2022. [PubMed]
- JM Leth, KZ Leth-Espensen, KK Kristensen, A Kumari, AM Lund Winther, SG Young, M Ploug. Evolution and Medical Significance of LU Domain-Containing Proteins.. Int. J. Mol. Sci., 2019
- ZC Liu, R Zhang, F Zhao, ZM Chen, HW Liu, YJ Wang, P Jiang, Y Zhang, Y Wu, JP Ding. Venomic and transcriptomic analysis of centipede Scolopendra subspinipes dehaani.. J. Proteome Res., 2012. [PubMed]
- Y Lu, F Luo, A Zhou, C Yi, H Chen, J Li, Y Guo, Y Xie, W Zhang, D Lin. Whole-genome sequencing of the invasive golden apple snail Pomacea canaliculata from Asia reveals rapid expansion and adaptive evolution.. Gigascience, 2024
- O Mirat, JR Sternberg, KE Severi, C Wyart. ZebraZoom: an automated program for high-throughput behavioral analysis and categorization.. Front. Neural Circuits, 2013. [PubMed]
- M Mirdita, K Schutze, Y Moriwaki, L Heo, S Ovchinnikov, M Steinegger. ColabFold: making protein folding accessible to all.. Nat. Methods, 2022. [PubMed]
- D Moi, C Bernard, M Steinegger, Y Nevers, M Langleib, C Dessimoz. Structural phylogenetics unravels the evolutionary diversification of communication systems in grampositive bacteria and their viruses.. bioRxiv:2023.2009.2019.558401., 2023
- CB Mourao, EF Schwartz. Protease inhibitors from marine venomous animals and their counterparts in terrestrial venomous animals.. Mar. Drugs, 2013. [PubMed]
- E Müller, CM Hackney, L Ellgaard, JP Morth. High-resolution crystal structure of the Mu8.1 conotoxin from Conus mucronatus.. Acta Crystallogr. F Struct. Biol. Commun., 2023. [PubMed]
- LD Nielsen, MM Foged, A Albert, AB Bertelsen, CL Soltoft, SD Robinson, SV Petersen, AW Purcell, BM Olivera, RS Norton. The three-dimensional structure of an H-superfamily conotoxin reveals a granulin fold arising from a common ICK cysteine framework.. J. Biol. Chem., 2019. [PubMed]
- C Peng, G Yao, BM Gao, CX Fan, C Bian, J Wang, Y Cao, B Wen, Y Zhu, Z Ruan. High-throughput identification of novel conotoxins from the Chinese tubular cone snail (Conus betulinus) by multi-transcriptome sequencing.. Gigascience, 2016. [PubMed]
- T Raffaelli, DT Wilson, S Dutertre, J Giribaldi, I Vetter, SD Robinson, A Thapa, A Widi, A Loukas, NL Daly. Structural analysis of a U-superfamily conotoxin containing a mini-granulin fold: Insights into key features that distinguish between the ICK and granulin folds.. J. Biol. Chem., 2024. [PubMed]
- E Rivera-de-Torre, C Rimbault, TP Jenkins, CV Sorensen, A Damsbo, NJ Saez, Y Duhoo, CM Hackney, L Ellgaard, AH Laustsen. Strategies for Heterologous Expression, Synthesis, and Purification of Animal Venom Toxins.. Front. Bioeng. Biotechnol., 2021. [PubMed]
- SD Robinson, Q Li, A Lu, PK Bandyopadhyay, M Yandell, BM Olivera, H Safavi-Hemami. The Venom Repertoire of Conus gloriamaris (Chemnitz, 1777), the Glory of the Sea.. Mar Drugs, 2017
- SD Robinson, H Safavi-Hemami, LD McIntosh, AW Purcell, RS Norton, AT Papenfuss. Diversity of conotoxin gene superfamilies in the venomous snail, Conus victoriae.. PLoS One, 2014. [PubMed]
- VV Rogov, A Rozenknop, NY Rogova, F Lohr, S Tikole, V Jaravine, P Guntert, I Dikic, V Dötsch. A universal expression tag for structural and functional studies of proteins.. Chembiochem, 2012. [PubMed]
- H Safavi-Hemami, MM Foged, L Ellgaard, M Feige. Evolutionary Adaptations to Cysteine-rich Peptide Folding. In:. Oxidative Folding of Proteins: Basic Principles, Cellular Regulation and Engineering: Royal Society of Chemistry. p., 2018
- H Safavi-Hemami, H Hu, DG Gorasia, PK Bandyopadhyay, PD Veith, ND Young, EC Reynolds, M Yandell, BM Olivera, AW Purcell. Combined proteomic and transcriptomic interrogation of the venom gland of Conus geographus uncovers novel components and functional compartmentalization.. Mol. Cell Proteomics, 2014. [PubMed]
- H Safavi-Hemami, Q Li, RL Jackson, AS Song, W Boomsma, PK Bandyopadhyay, CW Gruber, AW Purcell, M Yandell, BM Olivera. Rapid expansion of the protein disulfide isomerase gene family facilitates the folding of venom peptides.. Proc. Natl. Acad. Sci. USA, 2016. [PubMed]
- CD Schwieters, JJ Kuszewski, N Tjandra, GM Clore. The Xplor-NIH NMR molecular structure determination package.. J. Magn. Reson., 2003. [PubMed]
- SP Skinner, RH Fogh, W Boucher, TJ Ragan, LG Mureddu, GW Vuister. CcpNmr AnalysisAssign: a flexible platform for integrated NMR analysis.. J. Biomol. NMR, 2016. [PubMed]
- GS Slater, E Birney. Automated generation of heuristics for biological sequence comparison.. BMC Bioinformatics, 2005. [PubMed]
- M Suyama, D Torrents, P Bork. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments.. Nucleic Acids Res., 2006. [PubMed]
- RW Teichert, NJ Smith, S Raghuraman, D Yoshikami, AR Light, BM Olivera. Functional profiling of neurons through cellular neuropharmacology.. Proc. Natl. Acad. Sci. USA, 2012. [PubMed]
- Y Terrat, D Biass, S Dutertre, P Favreau, M Remm, R Stocklin, D Piquemal, F Ducancel. High-resolution picture of a venom gland transcriptome: case study with the marine snail Conus consors.. Toxicon, 2012. [PubMed]
- F Teufel, JJ Almagro Armenteros, AR Johansen, MH Gislason, SI Pihl, KD Tsirigos, O Winther, S Brunak, G von Heijne, H Nielsen. SignalP 6.0 predicts all five types of signal peptides using protein language models.. Nat. Biotechnol., 2022. [PubMed]
- Y Tian, CD Schwieters, SJ Opella, FM Marassi. High quality NMR structures: a new force field with implicit water and membrane solvation for Xplor-NIH.. J. Biomol. NMR, 2017. [PubMed]
- BM Ueberheide, D Fenyo, PF Alewood, BT Chait. Rapid sensitive analysis of cysteine rich peptide venom components.. Proc. Natl. Acad. Sci. USA, 2009. [PubMed]
- EA Undheim, LL Grimm, CF Low, D Morgenstern, V Herzig, P Zobel-Thropp, SS Pineda, R Habib, S Dziemborowicz, BG Fry. Weaponization of a Hormone: Convergent Recruitment of Hyperglycemic Hormone into the Venom of Arthropod Predators.. Structure, 2015. [PubMed]
- EA Undheim, M Mobli, GF King. Toxin structures as evolutionary tools: Using conserved 3D folds to study the evolution of rapidly evolving peptides.. Bioessays, 2016. [PubMed]
- M van Kempen, SS Kim, C Tumescheit, M Mirdita, J Lee, CLM Gilchrist, J Soding, M Steinegger. Fast and accurate protein structure search with Foldseek.. Nat. Biotechnol., 2024. [PubMed]
- BM von Reumont, G Anderluh, A Antunes, N Ayvazyan, D Beis, F Caliskan, A Crnkovic, M Damm, S Dutertre, L Ellgaard. Modern venomics-Current insights, novel methods, and future perspectives in biological and applied animal venom research.. Gigascience, 2022
- AM Waterhouse, JB Procter, DM Martin, M Clamp, GJ Barton. Jalview Version 2–a multiple sequence alignment editor and analysis workbench.. Bioinformatics, 2009. [PubMed]
- M Watkins, DR Hillyard, BM Olivera. Genes expressed in a turrid venom duct: divergence and similarity to conotoxins.. J. Mol. Evol., 2006. [PubMed]
- MR Wilkins, E Gasteiger, A Bairoch, JC Sanchez, KL Williams, RD Appel, DF Hochstrasser. Protein identification and analysis tools in the ExPASy server.. Methods Mol. Biol., 1999. [PubMed]
- SR Woodward, LJ Cruz, BM Olivera, DR Hillyard. Constant and hypervariable regions in conotoxin propeptides.. EMBO J., 1990. [PubMed]
- Z. Yang. PAML: a program package for phylogenetic analysis by maximum likelihood.. Comput. Appl. Biosci., 1997. [PubMed]
- PA Zobel-Thropp, EA Bulger, MHJ Cordes, GJ Binford, RG Gillespie, MS Brewer. Sexually dimorphic venom proteins in long-jawed orb-weaving spiders (Tetragnatha) comprise novel gene families.. PeerJ, 2018. [PubMed]
- M Zouridakis, P Giastas, E Zarkadas, D Chroni-Tzartou, P Bregestovski, SJ Tzartos. Crystal structures of free and antagonist-bound states of human alpha9 nicotinic receptor extracellular domain.. Nat. Struct. Mol. Biol., 2014. [PubMed]
