The Insidious Underbelly of Protein Aggregation

It is well known, and accepted, that it is essential to have “high quality” protein at the outset of a target based, small molecule drug discovery campaign. But what does “high quality” mean?

Certainly biological activity is a must. A high degree of purity (typically ≥95%) is another crucial requirement. Most often “purity” is defined based on the presence of other proteins in the preparation, but the presence of small molecules, whether or not of biological origin, also needs to be considered. However, the aggregation state is only rarely considered, and yet it significantly impacts the entire drug discovery pipeline.

Figure 1 shows the detrimental effects of protein aggregation on kinase activity of a target we worked on. Protein received from a 3rd party was heavily aggregated and although enzymatic activity could be detected, quantitative analysis showed that the specific activity (amount of activity per unit mass of protein) was low. We further purified and isolated the monomeric form of the protein resulting in >5x increase in the specific activity. Monomeric (or more precisely, monodisperse) protein is also essential for biophysical experiments such as Surface Plasmon Resonance (SPR) for accurately and sensitively characterizing small molecule-protein interactions, and structural biology, such as X-ray crystallography and cryo-Electron Microscopy for Structure Based Drug Design.

Figure 1. Effect of protein aggregation on biological activity.

How Do I Know if My Protein Sample is Aggregated?

Size Exclusion Chromatography (SEC or gel-filtration as its alternatively known as) is commonly used as a protein purification step. SEC separates protein molecules according to their size (molecular weight) and shape by passing a sample through a column packed with a porous resin. Depending on the size of the pores, a protein of a given hydrodynamic radius, which is correlated with the molecular mass (Mr), may or may not diffuse into the interior of the bead. If it does, its migration through the resin in the column will be retarded. Thus, large proteins, which cannot enter the pores in a bead, elute first in SEC followed by lower molecular weight proteins. By first running protein standards of known Mr through the column, it is possible to determine the approximate Mr (through the dependence of the hydrodynamic radius on Mr). Protein aggregates are typically large, but variable in size. Therefore aggregates will elute first, but depending on the nature of your SEC column, possibly in a broad peak that could overlap with a well-defined, large oligomer (later in this series we will look at a decamer) as shown in Figure 2A. So there can be some real uncertainty about whether your protein is aggregated or not.

Figure 2A. Elution profile of a purified protein from an analytical S200 Increase column.

Figures 2B and 2C illustrate a further uncertainty when attempting to determine the Mr using SEC. In Figure 2B, the green chromatogram shows the UV absorbance as a function of elution volume (mL) for a set of standards and the cyan trace of a purified client protein which elutes at approximately 13.9 mL. This elution position indicates the client protein has the same hydrodynamic radius as Conalbumin, which has a Mr of 75 kDa. The Mr of the client protein is 70 kDa based on the amino acid composition. Here, the close match between the experimental and calculated data indicates the protein is likely a monomer that is highly globular, since the correlation between elution volume and Mr is best when the protein is globular.  However, the correlation between elution volume and Mr is not always good. Recently SEC analysis was applied to two related samples (Figure 2C). Based on the elution pattern, one would predict that sample B would represent a slightly larger protein than sample C. But this interpretation was inconsistent with data derived from orthogonal methods.

Figure 2B. Determination of the molecular mass (Mr) of an unknown protein by SEC.

Figure 2C. Determination of the Mr of two related protein preparations by SEC.

How can we resolve this conundrum?

A Better Way to Size Your Proteins - MALS

The analysis of protein size data based on the elution profile from SEC was both complicated and not necessarily consistent with orthogonally obtained data. What can we do about this situation? At ZoBio we have attached a multi-angle light scattering (MALS) detector to our chromatography system. Using MALS (in combination with an instrument that measures differential refractive index, but for simplicity will ignore that here) one can determine the absolute Mr and the dispersity of the sample simultaneously. The dispersity is a measure of the homogeneity of the oligomeric state of the sample, where a non-specific aggregate will be highly polydisperse and a protein that exists uniquely as a monomer would be perfectly monodisperse.

Figure 3A shows the SEC chromatogram from our previous post, but now with the MALS trace overlaid and the calculated Mr. With the additional information we clearly see that the Mr of the first peak strongly decreases as the peak elutes, resulting in significant polydispersity. This is characteristic of non-specifically aggregated protein and the UV trace in blue shows that it represents the majority of protein in the sample. A peak of well-behaved protein (peak 3) consistent with a monodisperse heterodimer elutes last, while a peak of more polydisperse protein elutes just ahead of it. The Mr of peak 2 (~84 kDa) suggests a partially aggregated protein that might serve to nucleate the highly aggregated species in peak 1.

Figure 3A. SEC-MALS analysis of a purified protein. Peak 1 includes species with Mr ranging from 0.4 – 1 MDa with a high polydispersity (1.15) where peak 3 has an Mr of ~70 kDa with low polydispersity (1.002) suggesting it’s a well-behaved heterodimer as expected. Peak 2 exhibits an Mr of ~84 kDa and increased polydispersity (1.02) suggesting it consists of partially aggregated protein.

With the increased accuracy and information content of the SEC-MALS analysis, we can revisit our confusing sample from the SEC analysis in our previous post (Figure 3B). Here we see that, much to our surprise, the more quickly eluting sample A has an Mr of 13.5 kDa, while that of sample B is 25.5 kDa. The SEC-MALS data is consistent with other data and helped to provide insight into the solution behaviour. The interpretation for the anomalous elution pattern is that the protein is a monomer in the buffer conditions of A and a dimer in B. Speculatively, the shape of the dimer must be much more asymmetric than the monomer, although that is not sufficient to fully explain the elution pattern.

Figure 3B. SEC-MALS analysis of the Mr of two related protein preparations. Sample A Mr=13.5 kDa (blue) and sample B Mr=25.5 kDa (red).

My protein wants to aggregate, how do I prevent it from doing so?

As we’ve seen, SEC-MALS is an accurate, information rich analytical approach to determining the size and polydispersity of protein samples. So how can we use it to solve protein formulation problems so that we obtain well-behaved samples for assays and structural biology? Here’s an example: In one client project, the crystal structure of a fragment hit bound to the target protein was obtained at high resolution. However, the compound bound at a clearly artifactual site created by crystal lattice contacts. To overcome this challenge, we used solution NMR spectroscopy to determine the structure of the complex. The first NMR spectrum we acquired was nearly empty (Figure 4A, “Initial Condition”). Since SEC was not used in the provided purification protocol, we used SEC-MALS to characterize the NMR sample. Here we found a mix of two monodisperse peaks of which ~10% represented a monomer and ~90% a decamer. The 400 kDa decameric form results in very fast relaxation (disappearance) of the NMR signal explaining the first spectrum. Since the protein was active and some reports in the literature suggested this might be a relevant biological form, we screened for buffer conditions that might convert the decamer to a monomer using SEC-MALS as the readout. As can be seen, we succeeded in producing a sample that was ~80% monomeric and importantly, did not interconvert. We could then isolate the monomer which yielded an extremely stable sample that provided very high quality NMR spectra and was stable for many months! With these conditions we could solve the structure and determine where the compound bound.

Figure 4A. SEC-MALS as a tool for optimizing protein samples for structural biology.

However, SEC-MALS is a low throughput method that consumes large quantities of protein in screening mode, so we sought a more efficient method to optimize buffer conditions for each protein. For this we used an instrument simultaneously capable of determining the thermal denaturation (i.e. melting temperature) and the size of particles through light scattering. This allows us to look at the effect of buffer components on protein stability and aggregation in a sample size of only 15µl. Figure 4B shows how we used this instrument to screen 146 buffer conditions to find one more physiologically relevant. The initial buffer contained 300 mM NaCl which was necessary to prevent aggregation. However, the reported protein binding partner was not well behaved under these conditions and it was not possible to demonstrate the interaction. With the screen we found that 100 mM citrate at pH 6 significantly increased the stability of the target which remained monodisperse. Since the proposed binding partner was also well behaved, we could properly assay the interaction using our Surface Plasmon Resonance (SPR) instrumentation. We could then unequivocally demonstrate that the interaction did not occur thereby preventing a wasted campaign.

Figure 4B. Buffer screen to improve protein stability and monodispersity using thermal denaturation and particle sizing as simultaneous readouts.