Effective population size (Ne)

The fundamental parameter of the coalescent is the effective population size (Ne). This determines the expected waiting time until N samples in a population coalesce to a common ancestor. In ipcoal we are interested in a structured coalescent, where a population tree or network enforces a hierarchical model where individuals from distinct populations cannot coalesce with each other until the populations merge back in time (unless there is migration).

In ipcoal you can easily set up simulations on empiciral trees or networks with variable Ne values across different edges. This is demonstrated here, as well as some visualization tools for verifying expected results.

[42]:
import toytree
import ipcoal
import numpy as np

This tree drawing style that will be used in following plots.

[43]:
# drawing style
tstyle = {
    'layout': 'd',
    'edge_type': 'c',
    'tip_labels': True,
    'node_labels': False,
    'node_sizes': 8,
    'node_style': {
        "stroke": "#262626",
        "stroke-width": 1,
    },
    'scalebar': True,
}

Simulate with low ILS

We can start with the simplest simulation in which Ne is set to a very low value such that samples within populations coalesce very rapidly, and before any population divergence events. We do not expect to obseve any ILS in this case – genealogies should match the species tree.

[44]:
# generate a random species tree
tre = toytree.rtree.unittree(8, treeheight=1e5, seed=123)

# simulate 10 unlinked SNPs on this tree
model1 = ipcoal.Model(tree=tre, Ne=2, seed=123)
model1.sim_snps(10)
[52]:
# load gene trees to a multitree
mtre = toytree.mtree(model1.df.genealogy)

# draw the species tree
c, a = tre.draw(**tstyle, width=225, height=200);
c.text(c.width / 2, 20, "<b>Species tree</b>");

# draw the first 4 genealogies
c, a = mtre.draw_tree_grid(
    shared_axis=True,
    fixed_order=True,
    width=600, height=200, ncols=4, nrows=1,
    **tstyle)
c.text(c.width / 2, 20, "<b>Genealogies</b>");
r6r4r7r2r5r0r3r1 050000100000Species tree
r0r5r3r1r4r6r2r7 r0r5r3r1r4r6r2r7 r0r5r3r1r4r6r2r7 r0r5r3r1r4r6r2r7 050004100008Genealogies

Simulate with high ILS

By setting the Ne value higher we expect to observe longer waiting times to coalescence, and likely coalescence events that predate population divergences such that ILS may occur.

[11]:
# generate species tree
tre = toytree.rtree.unittree(8, treeheight=1e5, seed=123)

# simulate 10 unlinked SNPs on model with high Ne
model2 = ipcoal.Model(tree=tre, Ne=1e5, seed=123)
model2.sim_snps(10)
[62]:
# load gene trees to a multitree
mtre = toytree.mtree(model2.df.genealogy)

# draw the species tree
c, a = tre.draw(**tstyle, width=225, height=200);
c.text(c.width / 2, 20, "<b>Species tree</b>");

# draw the genealogies
c, a = mtre.draw_tree_grid(
    shared_axis=True,
    fixed_order=True,
    width=600, height=250, ncols=4, nrows=1,
    **tstyle)
c.text(c.width / 2, 20, "<b>Genealogies</b>");
r6r4r7r2r5r0r3r1 050000100000Species tree
r1r0r2r3r5r7r4r6 r1r0r2r3r5r7r4r6 r1r0r2r3r5r7r4r6 r1r0r2r3r5r7r4r6 0447503895006Genealogies

Simulate with variable Ne

You can use toytree to set Ne values to vary among nodes by setting a feature to each node called “Ne”. This is demonstrated below using the .set_node_values() function call.

[64]:
# init a random tree and draw it
tre = toytree.rtree.baltree(8, treeheight=1e6)

# set specific values to some nodes and a global default
tre = tre.set_node_values(
    attr="Ne",
    values={8: 5e6, 9: 5e6, 12: 5e6},
    default=20,
)
[67]:
# draw to show edge widths and node idxs
tre.draw(
    width=300, height=250,
    edge_widths=np.log(tre.get_edge_values("Ne")),
    tree_style='c',
    tip_labels=True,
);
r0r1r2r3r4r5r6r7idx: 0 name: r0 dist: 333333.3333 support: 100 height: 0.00000idx: 1 name: r1 dist: 333333.3333 support: 100 height: 0.00001idx: 2 name: r2 dist: 333333.3333 support: 100 height: 0.00002idx: 3 name: r3 dist: 333333.3333 support: 100 height: 0.00003idx: 4 name: r4 dist: 333333.3333 support: 100 height: 0.00004idx: 5 name: r5 dist: 333333.3333 support: 100 height: 0.00005idx: 6 name: r6 dist: 333333.3333 support: 100 height: 0.00006idx: 7 name: r7 dist: 333333.3333 support: 100 height: 0.00007idx: 8 name: 8 dist: 333333.3333 support: 100 height: 333333.33338idx: 9 name: 9 dist: 333333.3333 support: 100 height: 333333.33339idx: 10 name: 10 dist: 333333.3333 support: 100 height: 333333.333310idx: 11 name: 11 dist: 333333.3333 support: 100 height: 333333.333311idx: 12 name: 12 dist: 333333.3333 support: 100 height: 666666.666712idx: 13 name: 13 dist: 333333.3333 support: 100 height: 666666.666713idx: 14 name: 14 dist: 333333.3333 support: 100 height: 1000000.00001405000001000000

As you can see in the tree above, the populations sizes are very small on one half of the tree, and very high on the other. This should lead to a characteristic genealogy shape with deep coalescences only for lineages 0, 1, 2, and 3.

[35]:
# init the model
model3 = ipcoal.Model(tree=tre, seed=12345)
[69]:
# load gene trees to a multitree
mtre = toytree.mtree(model3.df.genealogy)

# draw the species tree
c, a = tre.draw(
    width=300, height=250,
    edge_widths=np.log(tre.get_edge_values("Ne")),
    **tstyle,
);
c.text(c.width / 2, 20, "<b>Species tree</b>");

# draw the genealogies
c, a = mtre.draw_tree_grid(
    shared_axis=True,
    fixed_order=tre.get_tip_labels(),
    width=600, height=250, ncols=4, nrows=1,
    **tstyle)
c.text(c.width / 2, 20, "<b>Genealogies</b>");
r0r1r2r3r4r5r6r7 05000001000000Species tree
r0r1r2r3r4r5r6r7 r0r1r2r3r4r5r6r7 r0r1r2r3r4r5r6r7 r0r1r2r3r4r5r6r7 05000471000094Genealogies