Tuesday, April 22, 2014

Where are the STAP cells?

Signals Blog just posted a short commentary I wrote on STAP cells and why I think they're too good to be true:
Esophageal cancer is the end point on a spectrum of diseases. At the beginning, chronic acid reflux exposes the esophagus to the low pH levels of stomach acid, which is a risk factor for Barrett’s Esophagus. Barrett’s, in turn, is a risk factor for developing this type of cancer, which may take several decades to appear.

My original thoughts when STAP cells were first reported were that there would be a clear link between acidic conditions and stemness. ... But there’s no overwhelming evidence to suggest that that happens.
Acid reflux seems like the ideal natural experiment to prove that STAP cells exist, yet the esophagus doesn't turn into a mess of iPS cells every time someone gets heartburn.

Read on at Signals.

Monday, April 14, 2014

Called It: Artificial Blood from iPS Cells

Genetic Engineering & Biotechnology News reports that the Scottish National Blood Transfusion Service is looking into the safety of stem cell derived blood:
A team of scientists led by SNBTS director Marc Turner, M.D., Ph.D. is heading the project, which reflects the combined efforts of the BloodPharma consortium, which has benefited from a Strategic Award, provided by the Wellcome Trust, in the amount of £5 million (approximately $8.4 million). 

The research funded by the award involves multiplying and converting pluripotent stem cells into red blood cells for use in humans, with the aim of making the process scalable for manufacture on a commercial scale.
It's a study to test transfusions using small amounts of blood (5mL), but nevertheless it will be real stem cell derived blood.

I called it back in 2010.

Friday, April 11, 2014

Answer the Why of Your Work

All too often, scientists (and other research minded people) are drawn into a never ending spiral of questions.  Answers lead to questions which lead to answers, leading someone to inevitably describe the next line of inquiry and cap off their thoughts with "We need to do experiments to answer these questions."

The problem that usually arises is that no one objects.

Why not? It's easier to let someone go ahead and do their work than it is to stop and think about other things that can be done.

But assuming they've already decided that the questions are worthy of work, it should be easy for them to articulate why those questions need to be answered and why now is a good time to answer them.  Is it because there's a key conundrum in your field of specialization?  Will the answer tell us something useful about a disease, a key point about cells or disease, or a physical process?  On an extremely practical level, will your answer contribute to a publishable paper or getting a grant?

Or, most commonly, will your answer tell you that Gene X amongst 20,000 genes goes up or down because you poked a particular cell the right way?  That, too, may be important but you need to state why.

The reality is that not all questions need to be answered, at least not immediately.  Unanswered questions can simmer for a little while longer.

Monday, April 7, 2014

Big Data Sets, Multiple Hypothesis Testing, and Choices

Jason McDermott, at The Mad Scientist's Confectioners Club writes:
Here’s where the problem of a false dichotomy occurs. Many researchers who analyze large amounts of data believe that utilizing a hypothesis-based approach mitigates the effect of multiple hypothesis testing on their results. That is, they believe that they can focus their investigation of the data to a subset constrained by a model/hypothesis and thus reduce the effect that multiple hypothesis testing has on their analysis. Instead of looking at 10,000 proteins in a study they now look at only the 25 proteins that are thought to be present in a particular pathway of interest (where the pathway here represent the model based on existing knowledge). ... All well and good EXCEPT for the fact that the actual chance of detecting something by random chance HASN’T changed.
The article in its entirety is a good read, especially in describing the use of big data sets as a balance between hypothesis-driven projects and discovery-driven ones.  The former can loosely be described as "research" in it's classical sense, while the latter is sometimes derided as "a fishing expedition".  Both approaches can be useful, as long as you're honest with yourself and know what you're dealing with.

But the quote above isn't exactly accurate.  In the hypothetical 10,000 protein experiment, the chance of detecting any one thing as significant is the same whether you're looking at a subset of 25 or 250.  Given that constant random probability, the chance of finding anything significant is much greater in the whole set of 10,000 as compared with 25.  That 10,000 protein data set is where multiple testing is drastically needed.  You still need correction with 25 but you usually simple methods are adequate.  Picking the right way to correct your results is tricky, as I've seen large experiments designed as a fishing expeditions fail to detect known, real effects in the data set with statistical significance, even after multiple testing correction is done.

So if you know what you're looking for and have a specific question in mind, you can make multiple hypothesis testing work for you.  You won't have your big data set dilute away all your interesting observations.

Having something very specific to act on also means you're less likely to be fooled by chance and drawn down a path that's "significant".  You're free to restrict your observations to a more specific set of data and choose to look at any set of measurements based on the question at hand, and not the other way around. 

Of course, making the decision to ask a specific question should be made before seeing the data in it's entirety, not after the fact when something "looks good", but that's a whole other issue altogether.