Wednesday, May 15, 2013

Delivering the Hard Truth in a Science Talk

Karen Lips writes:
It’s profoundly frustrating to have a platform and a voice, but not to have a clear call to action for the public. A common theme in science communication is that we have to the audience care. And people do care – a lot! They are eager to help, to offer suggestions, to get involved. But at the end of my talks there is no magic bullet. The truths I have to offer are not easy, they don’t instantly make us feel better. If there is tough love, let’s call what I have to offer “hard hope”.
Part of the research process is to define problems, show how you solve them, and present the knowledge you've created in that cycle of work.  That's the happy ending scenario.  Usually a few minutes are tacked on to discuss the next steps: your current work and unsolved problems.  It's part of a delivery format that most scientists are taught to follow first, and I think what Lips' example speaks to.  Scientists aren't taught to give happy endings, but it's important to learn how to do exactly that.

If the problems you face are still huge and potentially unsolvable (like the extinction of species, as Lips writes about), the audience departs on a sad note. 

Most of the time selling science isn't like selling a book or a product: there's no action that makes the audience feel better.  Selling your science is about educating your audience about something new, novel, or useful to them, which helps them in whatever they do, regardless of whether they're researchers or a more general audience.

The hard part is convincing them that you've given something valuable in return for their time.  Only then do they buy into what you're speaking about.

Monday, May 13, 2013

Barns Are Red Because of How Nuclear Fusion Works

Yonathan Zunger offers a tongue-in-cheek, yet accurate explanation of why barns are usually colored red:
The answer ... is “because red paint is cheaper,” which is absolutely true, but it doesn’t really tell you why red paint is cheaper. It clearly isn’t because the Central Committee for the Pricing of Paints has decreed that red shall be in vogue this century, or because of the secret Communist sympathies of early American farmers. In fact, to answer this we have to go all the way to the formation of matter itself.
Stars will burn light elements in a well established order of fusion reactions, going through stages of burning hydrogen, helium, lithium, and other successively heavier elements:
Until it hits 56. At that point, the reactions simply stop producing energy at all; the star shuts down and collapses without stopping. This collapse raises the pressure even more, and sets off various nuclear reactions which will produce even heavier elements, but they don’t produce any energy: just stuff. These reactions only happen briefly, for a few centuries (or for some reactions, just a few hours!) while the star is collapsing, so they don’t produce very much stuff that’s heavier than 56.
What has 56 nucleons in it and is stable? A mixture of 26 protons and 30 neutrons -- that is, iron.
And it's the iron that ends up in red ochre (Fe2O3, aka hematite), the pigment used in barn paint, explains Zunger.  I have to wonder about the other iron based ochres like Yellow ochre (Fe2O3•H2O aka Limonite), Purple ochre, which is like red ochre but with a coarser particle size, and Brown ochre (goethite), which is made of partly hydrated iron oxide (rust) and why they're passed over.  I'm also surprised at the number of reference books available that explain the basis of colors, like The Chemical History of Color.

The comments have a few other interesting side notes, like this one from Francisco D'Antonia:
There is a specific combination of paint colors made from raw materials that when combined can create all the natural colors of the living world. Its been awhile, but yellow ochre, burnt sienna, cadmium yellow, burnt umber, cobalt blue and titanium white are a few.  I met someone once that worked for a company that made several of the raw colored powders from metal. Fascinating process.
The final question I'm left with is whether does Red ochre based paints really have that much of a cost advantage over other colors?  I've seen green barns and a blue barn or two, which suggests that not all farmers are the rational price-optimizing paint pickers that Zunger imagines.

Nonetheless, I give two thumbs up for his explanation!

Thursday, May 9, 2013

The Art of Self Reliance

Most complex fields, like science, are collaborative by nature.  People specialize in a field they are talented in and contribute to projects based on their talents.  When another person can do some work that's required better, faster, and cheaper, some coordination of efforts takes place and the project is passed around like a relay baton.  They call this collaboration and it's supposed to be seamless.

In reality, there's a cost to collaboration.  It's the overhead required to coordinate all these separate parts of work; the meetings, identification of work to be shared, and usually some hunting around for the right person to do the experiments in just the right way, followed by informal negotiation of when the work gets done.

Sometimes, the quantum of work to be delegated is so small that it's not even worth spreading the collaboration out.  You simply have to find the best person on your team (sometimes just you) and have them get the job done.  So someone that's never build a Markov Model will learn how to build one, or learn how to prepare next generation sequencing libraries.  Some would argue that, in the long run, this is still an inefficient way to get things done. 

But the process is educational, moves you to self-reliance, and it builds an appreciation for the difficulties other people live with.  You might even learn that pitching some "collaborations" on prospective partners are much bigger requests of them than you initially imagined.  Displaying sensitivity to their time might even help move them from No to Yes!

Tuesday, May 7, 2013

US Proposal to Replace Peer Review with "Political Review" in Grants Process

Steven Novella, at Science Based Medicine, writes:
[U.S.] Representative Lamar Smith has been developing legislation that would in effect replace the peer-review process by which grants are currently given with a congressional process. Rather than having relevant scientists and experts decide on the merits of proposed research Smith would have politicians decide. It is difficult to imagine a more intrusive and disastrous process for public science funding.
Novella also points out the three basic tenets of Smith's proposed legislation are reasonable, that is 1) science must advance the prosperity of the United States, 2) be of the highest quality and be groundbreaking, and 3) not duplicate other research projects.

Sounds good.  We want research that's useful, excellent, and efficiently delivered.

The problem with the first two points is that clear goals and applications need to be defined for research to be useful or deemed to be groundbreaking in order to receive funds.  This generally rules out a lot of academic research, which usually has a clear goal but not necessarily a good application for the knowledge that's to be acquired, while 'groundbreaking' research is usually recognized as such only after the fact.

He also points out that duplication of efforts are needed to tackle scientific problems. 

To a point, this is true, as I've seen numerous cases of nearly identical articles being published in the same issue of journals.  I tend to believe that even if we were to demand absolutely zero duplication of efforts, most 'duplicated' research really attacks the same question using two or more complementary approaches, which make the end results(s) much more believable, so you could argue that the research isn't really a duplication of effort at all.

Duplication of effort aside, it might be worthwhile to argue for is better coordination between groups interested in the same questions, but that's what conferences are for.

The last point I like in Novella's post is that while some political forces are eager to attack wasteful government spending, even private funding isn't as efficient as some would like to believe:
This can happen with private funding as well. I have seen it happen with disease research. Private charitable organizations raise money to research a disease. The organizers want that money to go to research that will directly benefit patients (who are often their primary donors). But if this prematurely pushes researchers toward clinical studies when we don’t have the basic science sufficiently worked out yet, you end up wasting a lot of time and resources on dead ends.
Just something to keep in mind when trying to reach a research goal, be it academic or applied, before the results are ready to stand on their own.

Friday, May 3, 2013

Genomic Sequencing Companies Continue To Evolve

There's a nice, short review of the evolving genome sequencing market in Nature Reviews Drug Discovery:
Historically, manufacturers have relied on selling sequencing technologies and reagents. Today, Illumina and other leading companies operate complex business models that encompass the manufacture of genomic sequencing technologies, the provision of commercial genomic sequencing services and the sale of products in the informatics and diagnostics markets.
I recently mentioned that sequencing companies are positioning themselves to become the backbone of the medical system, as it's the kind of technology that's suited to having a single point of contact if genomic information is needed from a wide range of samples.

I'm starting to think the trend isn't even limited to human health or research uses; Jay Flatley, Illumina's CEO quipped on a recent earnings call that "ultimately, ... you're going to be doing genotyping on every cow that's born and using that as a way to triage its future".  Over 30 million calves make that another huge application that isn't mired in the safety issues relevant to humans.

Returning to the Nature Review, it's important to keep several obstacles in mind that are still blocking genomic technologies from widespread use, in addition to the analysis bottleneck of being able to analyze all the data, which is the province of computational biologists like myself:
Despite the rapid progress in the development of sequencing strategies, the era of personalized medicine is still a distant goal. Several challenges remain, including the inadequate training of physicians in the area of personalized medicine, attaining the $1,000 genome, enhanced pharmaceutical R&D processes to leverage genomic advances and an international framework for regulating the use of genomic data in the clinic and thereby protecting patient privacy.

Thursday, May 2, 2013

The Art of Fitting Distributions With R

Here's some good advice on fitting distributions using R from Marcus Gesmann, a mathematician involved in the analysis of insurance markets.  He makes it clear that it's a bit of an art:
Suppose I have only 50 data points, of which I believe that they follow a log-normal distribution. How much variance can I expect? Well, let's experiment. I draw 50 random numbers from a log-normal distribution, fit the distribution to the sample data and repeat the exercise 50 times and plot the results using the plot function of the fitdistrplus package.
I notice quite a big variance in the results. For some samples other distributions, e.g. logistic, could provide a better fit. You might argue that 50 data points is not a lot of data, but in real life it often is, and hence this little example already shows me that fitting a distribution to data is not just about applying an algorithm, but requires a sound understanding of the process which generated the data as well.
He also republished a handy guide for deciding what distribution your data might belong to, taken from Probabilistic Approaches to Risk by Aswath Damodaran.

Wednesday, May 1, 2013

A What Point Does More Detail = Less Understanding?

I enjoy visiting Martin Krzywinski's homepage at the BC Genome Sciences Centre from time to time, as it's fascinating collection of great design ideas for communicating scientific data.  This time around, a presentation on designing effective visualizations in the biological sciences was worth the visit.

One slide caught my eye with a warning that most people probably consider obvious: "DO NOT DIVIDE YOUR SCALE INTO MORE THAN 500 INTERVALS".  Regrettably, I can remember a few biologists that would disagree and try to put everything possible into one intricately prepared figure.


Slide 15 from "Designing Effective Visualizations"


You could half-jokingly claim that most hyperdetailed scales are of limited use, except perhaps for pointing out how not to design a scale.  A scientist might counter with "The figure contains all the data!" but as a tool to communicate a concept they fall short.

It also turns out that designing good biological data visualizations isn't just an aesthetic exercise; it actually has an ironic origin in biology.   The example above reminded me of a very similar example in a book I received as a gift many years ago.

In Hack #34, O'Reilly's Mind Hacks points out that there's a limit to the visual selective attention the mind gives to groups of crowded dots or lines.  Basically, when details are crammed together beyond this limit, the viewer can't willfully focus their attention on any particular detail.  

I tried the examples in Mind Hacks (again) and found that truly, I can't concentrate on something as simple as an individual dot on a crowded field.  The surrounding points draw my attention away from the points I look at, again, and again, and again.

Which leads me to believe that if you're cramming data into scatterplots, I probably won't be able to focus on the points you think are important.


Mind Hacks contains many fun examples of when the average person's perception starts to breaks down and is a good guide for becoming aware of some very basic limitations of your eyes and brain.

Tuesday, April 30, 2013

Predicting Causes of Drug Side Effects

Why do some drugs cause side effects?

A group of computational led by Peer Bork just published a paper in Molecular Systems Biology describing how to predict the causes of undesirable drug side effects by integrating drug-to-target and drug-to-side effect relationships and asking which protein targets are the most likely cause of the side effect:
For more than half of the investigated side effects, we can predict which proteins cause the side effects upon perturbation by a marketed drug. For the majority of these proteins, we also predict whether their activation or inhibition causes the side effect. 
The paper includes a pretty elegant proof of principle using a mouse experiment. 

The group predicted that increased sensisitivity to pain, one side effect of triptan based migraine medications, was likely caused by off-target activation of serotonin receptors, specifically HTR7.  They confirmed that treating mice with zolmitriptan (Sold as Zomig or Zomigon by AstraZeneca) increased their pain sensitivity and then showed the effect could be reversed by the SB-269970 serotonin receptor inhibitor.

The next level of prediction would be to know which individuals are sensitive to these side effects and which ones are not, probably based on the variations they carry in the side-effect targets.

Monday, April 29, 2013

Remembering to Forget

From time to time, we all set aside creative projects that can't be finished in one go.  It might be planning or doing something at work like drafting a paper or designing experimental strategies, to one of many personal projects like finding a neighborhood in which to live or simply deciding on a furniture layout.

There comes a time where one becomes fixed on details that aren't important, and the problem exists for it's own sake. 

When this happens, the project needs to be set aside for a day, a week, or a month so you can forget the unimportant stuff an return with a fresh mind.  In part, that's what a weekend is for. Every so often, details that were all encompassing on Friday yield to a solution on Monday morning. 

Remember that some things are best forgotten.

Friday, April 26, 2013

An Analysis of Almost 400 Public Private Partnerships in 2012

Brady Huggett provides a quick overview of partnerships established between academia and industry in 2012 at Bioentrepreneur:
Harvard University and the University of Texas system top the list of universities with the most industry deals, and University College London lands ahead of the University of Oxford and the University of Cambridge.
A few observations:

Until now, I wasn't aware that the University of Texas was such a big place of support for industrial partnerships.  No big Canadian collaborations made it into the tables.

Secondly, the type of collaborations are dominated by the cancer and infectious diseases fields.  Unfortunately, only 4% of the activity was in gene and cell therapy (My friends at Signals Blog will be disappointed!).

Finally, some of the bigger announcements from the past year are mentioned, namely the CureBeta Initiative translating betatrophin as a possible treatment for diabetes and the Fluidigm/Broad collaboration to establish a Single-Cell Genomics Center.

Thursday, April 25, 2013

How Mature Was the Last Presentation You've Given?

Marc Kuchner, at Soapbox Science says:
We often argue over the quality of our colleagues’ presentations. When it is hiring time, for example, and faculty candidates are parading through your department, no doubt a common topic of conversation is who gave the best talk. And the maturity level of the research is often a contentious point. With these conversations in mind, I’d like to suggest a numerical scale we can use to describe scientific talks. This scale is not meant to weigh the overall quality of a talk, only to resolve some of the tension between those who prefer solid conclusions and those who enjoy more nebulous forecasting. 
The five point scale ranges from talks "that might one day lead to a testable new hypothesis or new data" to talks that "describe data and calculations that the community recognizes as part of its culture and history".

Though the simplicity of his five point scale is appealing, the higher levels depend on having a really great reputation and more importantly, time for that reputation to have developed.  So unfortunately, a Science Maturity Level 5 presentation is out of reach for most early career researchers.  However, it's still good to keep in mind that good talks don't have to be a flashbang hypothesis-data-conclusion-hypothesis-data-conclusion presentation that makes a scientist sound like an android from the 24th century.

Check out "What's Your Science Maturity Level".

Tuesday, April 23, 2013

From Gene Symbols To Financial Crises, Excel is There

Here's a thing that economists could have learned from computational biologists: Silly mistakes with Microsoft Excel can have a serious impact on your work.

In an op-ed at The New York Times, Paul Krugman shares the story of how a flawed Excel formula contributed to a paper being published by Carmen Reinhart and Kenneth Rogoff stating that debt-to-GDP ratios above 90% contribute to much slower economic growth and became ammunition for anyone ideologically bent on cutting government spending, for whatever reason.  Krugman, to his credit, is an author of popular books on economics like 'End This Depression Now!' and 'The Conscience of a Liberal', which you may have read if you follow him.

Krugman points out that the problem with the Reinhart-Rogoff paper didn't really surface until the financial research community attempted to replicate the results, but couldn't manage to do: 
Other researchers, using seemingly comparable data on debt and growth, couldn’t replicate the Reinhart-Rogoff results. They typically found some correlation between high debt and slow growth — but nothing that looked like a tipping point at 90 percent or, indeed, any particular level of debt.
Finally, Ms. Reinhart and Mr. Rogoff allowed researchers at the University of Massachusetts to look at their original spreadsheet — and the mystery of the irreproducible results was solved. First, they omitted some data; second, they used unusual and highly questionable statistical procedures; and finally, yes, they made an Excel coding error.
Whether or not their intentions to omit data or use unusual statitstical procedures is another subject, but it's clear that misuse of Excel was clearly found to be one of the culprits.

Which brings me back to biology.  Why does Excel change gene symbols to dates?

During my PhD, my group sometimes used Excel to format tables for presentations or reports, and we quickly found gene symbols that were automatically converted into dates or scientific notations.  The solution at that point was to fix them by manually escaping the field with an apostrophe as a prefix to that Excel would know to treat the cell as text.  So "DEC1" became "'DEC1" prior to importing the data.  Shortly after we realized that this was a prevalent problem in the field, we considered writing a short paper on it but deemed it too trivial for a paper.

Behold, a few months later, someone publishes a paper on the very same topic in BMC Bioinformatics.  "Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics" summarizes the problems nicely with Table 1 in the paper:


You can see that the Septins, the OCTs (Octamer-binding transcription factors), Deleted in Esophageal Cancer (DEC1 and DEC2) genes, and a few others are affected when converted to date format.

Symbol changes generally don't become a huge problem as long as the data isn't being exported and run through software that uses the gene symbols, but you can appreciate how manipulating biological spreadsheets in Excel can be a pain, especially if one isn't aware that escape characters can be used, or let alone what escape characters are.

Moral of the story: It's important to double and triple check your work at each step of the way, otherwise minor coding errors might affect important things like gene lists or government policies!


Monday, April 22, 2013

Two Linked Drugs Are Better Than One

The MIT Technology Review has a short article describing a new strategy for double dosing multiple drugs; linking them together:
Catabasis has found a way around the challenge of identifying a single active molecule that can hit multiple pathways effectively: using a chemical linker to bring together two active molecules. The synergistic effect of the linked molecules may arise from the fact that the two compounds both “get to the right place at the right time.”
The company, Catabasis Pharmaceuticals, is developing a technology that can combine two compounds together using a linker that supposedly puts the drugs into an inactive form in the bloodstream, but which is cut within cells when the whole package is absorbed.

Friday, April 19, 2013

CRASS: Create Relevant Abbreviations for Science Studies

Alex Bond at The Lab and Field, on the misuse of abbreviations in research:
Abbreviations obfuscate meaning, and create a separation between those who know what it means, and those who don’t.  It’s fine to use them colloquially, but for professional correspondence like a work email signature, or affiliation on a manuscript, it looks like crap, and won’t matter to anyone outside the abbreviated institution or country.  
I totally agree.  Abbreviations are thrown around from time to time, but really, if you're going to put out something permanent that people will refer back to, be it an email, a published paper, a conference presentation, you should spell everything out.  At very least once in the document.

How many times have you read a paper and found an undefined acronym?   Commonly used methods, like RACE or PCR1, are especially vulnerable to the assumption that the reader knows what they are.  RACE, for instance, is an acronym with several meanings: "Rapid Amplification of cDNA Ends"; "Row-based ASCII Compatible Encoding"; "Return on Average Common Equity", and many more.

But acronyms (backronyms?) get even funnier when their force fit to projects:
And, for goodness sake, please avoid trying to make an existing lab/project/grant title into an acronym by selectively choosing letters to spell out a single word.  Nothing cries out “lame” so much as the Laboratory for Massive Experiments (LaME). You get the idea. 
I won't promise that I'll never make up an acronym that, but if I do, I'll try to keep it relevant.  When naming a scientific study with an acronym, remember to Create Relevant Abbreviations for Science Studies.

In the end, don't assume your reader knows what you're talking about.  You never know who will be reading what you put out there, so help them understand what you've written.


1PCR, for anyone that didn't know, stands for Polymerase Chain Reaction.

Postdoc Income Taxes: How Much Should I Set Aside?

As a short follow up to my post taking you through a hypothetical Canadian postdoc's income tax return, a few colleagues have asked me how much should be set aside from each paycheque to have enough to pay the inevitable tax bill come April.

The strategy is extremely simple: If you're in a situation where your payor does not deduct income taxes from your pay, you need to do it yourself.  Transfer part of each pay into a separate account, like a high interest savings account, or send it to the government yourself (though most people will tell you to wait until April).

What I've done here is assumed that you're a postdoctoral fellow that receives T4A (Box 105) Fellowship income, has no other income, is single, has no dependents, and has not paid any income tax through the year.  These are the numbers my 2012 tax software (TurboTax) returns as the Balance Owing.  They should cover many people, but of course your own situation will be slightly different so I'd say this is best used as a  rough estimation of how much to save to reduce the pain at the end of the year.

I've also ran the numbers through TurboTax for three provinces: Ontario, British Columbia, and Quebec.

Postdoc Salary Ontario Payable BC Payable Quebec Payable
$30,000 $4,117 $3,668 $3,110
$40,000 $6,292 $5,792 $4,363
$50,000 $9,332 $8,537 $6,012
$60,000 $12,447 $11,507 $7,849
$70,000 $15,577 $14,477 $9,686

Here's a quick example:

If you're in Ontario and are paid $40,000, you will pay $6,292 for the year.  If you are paid bi-weekly (26 pay periods per year) you need to set aside $6,292 / 26, or $242 per paycheque, to cover your income taxes.  If you're paid monthly, you'll need to part with $6,292 / 12, or $524 per paycheque.