Skip to content

biojava-phylo module extension #346

@lafita

Description

@lafita

I would like to make some changes to the phylo module of biojava. I would like to factor out the pairwise distance calculation in TreeConstructor.java to include easily more scoring functions for the evolutionary distance. I have in mind to include a structural similarity score, if the alignment comes from a structural alignment (some literature claims that the resulting tree is more accurate and has less uncertainty). Currently percentage of identity (PID) and substitution matrix scores are implemented.

I have also noticed that some code was ported from Jalview and is either not used or redundant with other biojava code. An example is the ResidueProperties.java class, that contains hard coded substitution matrices (PAM250, BLOSUM62) and a lot of color assignments and options never used in biojava (I suppose they are used for the Jalview GUI). About the matrices, the biojava-alignment module contains already these two substitution matrices in the resources folder, and many more, but the phylo module does not depend on the alignment module, so they cannot be accessed. I think it would be good to add the alignment module as a dependency of the phylo module, since generating sequence alignments is a previous step to the phylogenetic tree construction.

Lastly, some names are confusing in the module. For example, the TreeConstructionAlgorithm.java enum has BLOSUM and PID as its fields. By tree constructor, one understands one of the algorithms to build trees, like UPGMA or Neighbor Joining (NJ). Those are named in the enum TreeType.java, which should stand for one of the three tree types (Distance Tree, Maximum-Likelihood Tree or Pasimony Tree). The BLOSUM and PID fields should be grouped into something named like ScoreMatrixType to be consistent with the literature.

Does anyone have comments on (or oppose to) these changes?
Is there any important point I am missing?

Aleix

Metadata

Metadata

Assignees

Labels

enhancementImprovement of existing code or methodnew featureNew method or data structure

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions