Dienstag, 9. Mai 2017

Dose-response curve in R: the drc package

Thanks to the analysis of an interesting dataset I discovered the drc package for Dose-Response Analysis using R (Ritz et al, PLOS One 2015, https://doi.org/10.1371/journal.pone.0146021).

It comes with a very powerful optimiser for common models such as logistic function (or Hill function). Compared with the native R implementations using the nls function and self start models such as SSfpl, the drc package is far more reliable and robust: both initial parameter estimation and optimisation run without errors due to singularity : at least in the ~3,000 datasets that I tried, on which drc reported no single mistake, whereas nls failed as much as 600 cases despite of manually setting starting parameters with educated guess.

I still have to understand how the package achieves such good performance. However I am already very glad that we have finally a robust and reliable optimiser for curve fitting, which is a common task in computational biology and bioinformatics.

Figure: a 4-parameter logistic fit done and plotted with drc.

P.S. During the try-and-error process, I also accidentally found a website that is quite robust with regard to curve fitting: https://www.mycurvefit.com. Though I will not use it since I need programmatic access to the fitting capacity, the website's fitting function is quite robust in my opinion, at least better than the few examples that failed nls.

Montag, 8. Mai 2017

t-distributed Stochastic Neighbor Embedding (tSNE)

Notes by following the video lecture on t-SNE, given by Laurens van der Maaten at Google Tech Talk, available here on YouTube.

  • t-SNE has the advantage of other dimension reduction methods that it tries to keep local structures when mapping high dimensional spaces in 2D or 3D.
  • In the high-dimensional (hD) space, t-SNE uses multivariate Gaussian (normal) distributions to assign pairwise probabilities of data points.
  • In the low-dimensional (lD) space, t-SNE uses Euclidean distances to assign pairwise distances of data points.
  • In order to make the points in lD reflect their relationship in hD,  the Kullback–Leibler divergence between the two distributions is minimised.
  • The reason to use Student's t statistic is that it prevents the algorithm from rending dissimilar points too far apart in lD (TODO: I have not yet completely understood why).
  • In real-life applications, Barnes-Hut simulation and quadtree implementations make the algorithm efficient enough to handle reasonably large data sets with thousands of samples. 
  • Relation to physics: t-SNE gradient can be viewed as a simulation of the N-body problem, with a spring term, an exertion/compression term, and a sum. (TODO: The N-body problem is new to me).
  • As an extension, multi-map t-SNE allows representation of correlated features in different contexts.
  • t-SNE can be used to cluster samples and to assist feature selection.

Samstag, 23. Juli 2016

Weekly summary

In this week

  • I had an intensive learning phase of Python using Python Tutorial, The Hitchhiker's Guide to Python and countless other internet resources. Especially interesting I found following projects: nose for test-driven development, sphinx for generating documentations, and BioPython for bioinformatics tasks.
    • I thought about when to use R, python, and C/C++ appropriately and most effectively. I think R is very good at prototyping tools combining statistics and visualization. python is an excellent generic scripting language that has a large code base.  C/C++, being quite complex but efficient and powerful, remains my choice when it comes to optimize performance.
  • I spent some thought on how to integrate several layers of comics data together. The paper sent by my colleague Klas Hatje may be of interest for those who work in this field:  Integrating Transcriptomic and Proteomic Data Using Predictive Regulatory Network Models of Host Response to Pathogens by Chasman, et al.
  • My colleague Nikolaus Berntenis let me know about Paintomics, developed by another colleague Fernando Garcia-Alcade and his group. The web tool seems to be able to visualize multi-omics datasets using KEGG graphics.

Donnerstag, 14. Juli 2016

ANOVA (Doncaster and Davey): ANOVA model structures

Seven pricipal classes of ANOVA designs, with up to three treatment factors:

  1. One-factor: replicate measures at each level of a single explanation factor
  2. Nested: one factor nested in one or more other factors
  3. Factorial: fully replicated measures on two or more crossed factors
  4. Randomised blocks: repeated measures on spatial or temporal groups of sampling units
  5. Split plot: treatments applied at multiple spatial or temporal scales
  6. Repeated measures: subjects repeated measured or tested in temporal or spatial sequence
  7. Unreplicated factorial: a single measure per combination of two or more factors

Montag, 7. April 2014

Curious "cyclic namespace dependency" error

Dear R users, here I report a curious case of "cyclic namespace dependency error" and its solution, in case you meet the same trouble that confused me a lot.

In my case, I accidentally created a S4-method and another normal function with the same name. The package could be installed. However, as I tried to load the package, it prints the following error message:

Loading required package: cycle

Error in loadNamespace(package, c(which.lib.loc, lib.loc)) :
  cyclic namespace dependency detected when loading 'cycle', already loading 'cycle'

It looks like this package has a loading problem: see the messages for details.

For a minimal piece of code see https://gist.github.com/anonymous/10018579. When the file is put in a R file in the R package, and in NAMESPACE "export(myMethod)" is specified, the checking or loading of the package will fail due to the "cyclic namespace dependency".

The error message is unfortunately not particularly helpful. Normally it would point to a problem of reciprocal dependency, but in this case it is only caused by a name that is given to both a S4 method and a normal function.

It is straightforward to solve the issue: check carefully whether you have duplicated function/method names and fix them if there are any.

Sonntag, 13. Januar 2013

Book review: Mindset, the new psychology of success

Carol S. Dweck, Ph.D. Mindset, the new psychology of success, Ballantine Books 2008

In her book, Carol proposed and contrasted two types of mindsets: the fixed mindset and the growth mindset. Whereas the fixed mindset tends to see traits (such as intelligence, or achievement in sport) as fixed (or inborn) and desires to achieve success (and avoid failures) with little effort, the growth mindset sees intelligence and other abilities can be developed with willing efforts.

Though the book takes the subtitle "the new psychology of success", it was not at all teaching to pursue after fame, money or social status that are commonly associated with "success". Rather, it is more about  finding things that interest and challenge you, and constantly learning and developing yourself. Persisting in the face of setbacks, growing from criticism and feedback, and learning lessons from the success of others. Stretching yourself to develop whenever you feel (too) easy or comfortable holds its virtue for those who are talented (see countless cases in the book). It is motivating to see the self as an unfinished human being, who must learn and exercise efforts everyday.

The book was easy to read thanks to the plain writing style, although it packs numerous cases in less than 250 pages. I find the contrast between fixed and growth mindsets is too often too simply presented and interpreted. The author demonstrated the two mindsets in many areas including life, relationships, sport, and career with many vivid examples, which on one side made the book potentially useful for many readers, on the other side left many important questions superficially discussed or even not addressed. For instance, what are the neurological mechanisms that lead to the mindsets, how are they determined in the early childhood, and how are they modulated by diseases such as schizophrenia or depression? In many examples the author divided people into those with the fixed mindset and the other with the growth mindset, though she pointed out that they are seldom clearly separated but rather intervening in our personalities. Then how can one measure the "fixed-ness" or the "growth-ness"? Are they correlated with other psychological measures?

I may have overseen references and the 20-page notes that discussed these issues or pointed to the work that describe them, or the book is probably too easy-going and too popular to address these questions. Yet for light reading I found it inspires and motivates me to stand up against some of my own problems and to invest more life in the things that I care. In short I would recommend it with 3.5 points out of 5.

p.s. Link to Good reads (rating 4 out of 5): http://www.goodreads.com/book/show/40745.Mindset