Samstag, 23. Juli 2016

Weekly summary

In this week

  • I had an intensive learning phase of Python using Python Tutorial, The Hitchhiker's Guide to Python and countless other internet resources. Especially interesting I found following projects: nose for test-driven development, sphinx for generating documentations, and BioPython for bioinformatics tasks.
    • I thought about when to use R, python, and C/C++ appropriately and most effectively. I think R is very good at prototyping tools combining statistics and visualization. python is an excellent generic scripting language that has a large code base.  C/C++, being quite complex but efficient and powerful, remains my choice when it comes to optimize performance.
  • I spent some thought on how to integrate several layers of comics data together. The paper sent by my colleague Klas Hatje may be of interest for those who work in this field:  Integrating Transcriptomic and Proteomic Data Using Predictive Regulatory Network Models of Host Response to Pathogens by Chasman, et al.
  • My colleague Nikolaus Berntenis let me know about Paintomics, developed by another colleague Fernando Garcia-Alcade and his group. The web tool seems to be able to visualize multi-omics datasets using KEGG graphics.

Donnerstag, 14. Juli 2016

ANOVA (Doncaster and Davey): ANOVA model structures

Seven pricipal classes of ANOVA designs, with up to three treatment factors:

  1. One-factor: replicate measures at each level of a single explanation factor
  2. Nested: one factor nested in one or more other factors
  3. Factorial: fully replicated measures on two or more crossed factors
  4. Randomised blocks: repeated measures on spatial or temporal groups of sampling units
  5. Split plot: treatments applied at multiple spatial or temporal scales
  6. Repeated measures: subjects repeated measured or tested in temporal or spatial sequence
  7. Unreplicated factorial: a single measure per combination of two or more factors

Montag, 7. April 2014

Curious "cyclic namespace dependency" error

Dear R users, here I report a curious case of "cyclic namespace dependency error" and its solution, in case you meet the same trouble that confused me a lot.

In my case, I accidentally created a S4-method and another normal function with the same name. The package could be installed. However, as I tried to load the package, it prints the following error message:

Loading required package: cycle

Error in loadNamespace(package, c(which.lib.loc, lib.loc)) :
  cyclic namespace dependency detected when loading 'cycle', already loading 'cycle'

It looks like this package has a loading problem: see the messages for details.

For a minimal piece of code see When the file is put in a R file in the R package, and in NAMESPACE "export(myMethod)" is specified, the checking or loading of the package will fail due to the "cyclic namespace dependency".

The error message is unfortunately not particularly helpful. Normally it would point to a problem of reciprocal dependency, but in this case it is only caused by a name that is given to both a S4 method and a normal function.

It is straightforward to solve the issue: check carefully whether you have duplicated function/method names and fix them if there are any.

Sonntag, 13. Januar 2013

Book review: Mindset, the new psychology of success

Carol S. Dweck, Ph.D. Mindset, the new psychology of success, Ballantine Books 2008

In her book, Carol proposed and contrasted two types of mindsets: the fixed mindset and the growth mindset. Whereas the fixed mindset tends to see traits (such as intelligence, or achievement in sport) as fixed (or inborn) and desires to achieve success (and avoid failures) with little effort, the growth mindset sees intelligence and other abilities can be developed with willing efforts.

Though the book takes the subtitle "the new psychology of success", it was not at all teaching to pursue after fame, money or social status that are commonly associated with "success". Rather, it is more about  finding things that interest and challenge you, and constantly learning and developing yourself. Persisting in the face of setbacks, growing from criticism and feedback, and learning lessons from the success of others. Stretching yourself to develop whenever you feel (too) easy or comfortable holds its virtue for those who are talented (see countless cases in the book). It is motivating to see the self as an unfinished human being, who must learn and exercise efforts everyday.

The book was easy to read thanks to the plain writing style, although it packs numerous cases in less than 250 pages. I find the contrast between fixed and growth mindsets is too often too simply presented and interpreted. The author demonstrated the two mindsets in many areas including life, relationships, sport, and career with many vivid examples, which on one side made the book potentially useful for many readers, on the other side left many important questions superficially discussed or even not addressed. For instance, what are the neurological mechanisms that lead to the mindsets, how are they determined in the early childhood, and how are they modulated by diseases such as schizophrenia or depression? In many examples the author divided people into those with the fixed mindset and the other with the growth mindset, though she pointed out that they are seldom clearly separated but rather intervening in our personalities. Then how can one measure the "fixed-ness" or the "growth-ness"? Are they correlated with other psychological measures?

I may have overseen references and the 20-page notes that discussed these issues or pointed to the work that describe them, or the book is probably too easy-going and too popular to address these questions. Yet for light reading I found it inspires and motivates me to stand up against some of my own problems and to invest more life in the things that I care. In short I would recommend it with 3.5 points out of 5.

p.s. Link to Good reads (rating 4 out of 5):

Dienstag, 8. Januar 2013

Paper note: Nearest Template Prediction: A Single-Sample-Based Flexible Class Prediction with Confidence Assessment

Read on 7.1.2013. Hoshida Y (2010) Nearest Template Prediction: A Single-Sample-Based Flexible Class Prediction with Confidence Assessment. PLoS ONE 5(11): e15543.doi:10.1371/journal.pone.0015543

The Nearest Template Prediction (NTP) algorithm predicts sample classes using pre-defined gene signatures and a permutation test.

Suppose we are interested in a two-class prediction problem of sample class A and B. Further suppose we have nA gene signatures (markers.A) over-expressed in class A and nB gene signatures (markers.B) over-expressed in class B. The template is then defined as an ordered vector containing nA+nB elements: first nA elements correspond over-expressed genes in subclass A (marker.A), and next nB elements correspond over-expressed genes in class B (marker B). The template of class A has 1 in the positions of marker.A, and -1 in those of marker.B. The template of class B is similarly defined.

Suppose we have determined expression of N genes (N>=nA+nB) in a new sample S. First the (raw) expression values of signature A and signature B are extracted in the same order as the template defined  above. The similarity between S and the two classes is determined by cosine similarity (by default) or Pearson's correlation coefficient. The prediction confidence is measured by permutation tests: a number of nA+nB genes are randomly chosen from N genes for a large number of time (say 10000); the distribution of distances are used to measure the statistical significance of the predictions.

In case of multi-class prediction problems, the template is a concatenate of genes over-expressed in each class. For the template of each class, genes over-expressed in that class is given the value 1 and the rest genes given the value 0.

One desirable property of the NTP algorithm is that it can take one single sample as input: no control sample is needed and no co-variate structure has to be known from a collection of samples. It was shown to have a similar error rate compared with other single-sample prediction algorithms, including Support Vector Machine (though the author did not state whether the SVM was optimized), weighted voting, CART (classification and regression tree) and k-nearest neighbour.

Compared with Wilcoxon-Mann-Whitney (WMW) test using gene signatures, the NTP algorithm can handle over-expressed and under-expressed genes simultaneously. Besides the null hypothesis of NTP is that distance between gene signatures in S and the templates are no higher than randomly chosen genes. This is closer to what we are interested in than the null hypothesis of WMW tests, namely the mean of gene signatures is the same as the background.

Still unclear to me is how are the signatures best defined. According to the authors, t-statistics, log fold changes (logFCs), signal-2-noise ratios or other measures can be used to (1) identify the signatures and (2) weight the template vector. I could not found information regarding which measure (or combination of measures) works best with the NTP algorithm.

Sonntag, 4. Dezember 2011

Arduino Project (on going)

As a young boy, I always fancied to have a room full of mechanic and electronic things of myself: where I can control everything. Well, that was kind of being naive - how could I possibly not dream of programming them?

Now there is a chance to make that dream true - at least partially, since I cannot afford a room devoted - by the Arduino project. With a combination of low-cost, easy-to-use hardware and software, the Arduino project aims to offer an easy way to get on the board of microcontroller programming.

With no background at all in the field, I was simply excited by a few youtube videos (one about high-speed photography, or absolute air drums). When it came to the documentation in the Arduino website, getting started seems to be straight-forward.

Google and Amazon searches suggested there are a few starter kits available, however most of them seem to be sold in US or UK. There have been a few interesting items from Germany (offer from mercateo, or a very interesting one from the Fritzing project for ca 65€). A single piece of the Arduino board costs 25€ from Amazon. I will consider whether it is worth simply "buying-in" or should I buy components separately.

I find the idea of doing something myself with a controller is absolutely cool - especially a way of combing software programming and hand-on practices. I reckon it will not be easy for me without any background; so far it makes fun, however, I will be ready to take the challenge.

P.S. It was a bit funny how I came to know the project: I was trying to repaire the wires of the front light of my bicycle, when it came to my mind that I needed 电焊 - welding in English or Löten in German. As I searched for a Lötkolben-Set and viewed a tutorial teaching how to use one (as seen it used but never got practice), Youtube recommended (based probably on the pattern of DIY) another video of Arduino. It was funny how a plain handicraft task on Sunday in the basement opened a new field:)