The Results Table (.df)

The results of a simulation are stored to the simulation object in a Pandas DataFrame accessible from the .df attribute.

[1]:
import ipcoal
import toytree

Simulate data under a tree model

[2]:
tre = toytree.rtree.unittree(5, treeheight=1e6)
[3]:
model = ipcoal.Model(tree=tre, Ne=1e6, recomb=0.0)
[4]:
model.sim_loci(nloci=15, nsites=100)

View the results table

[5]:
model.df["inferred"] = "NaN"
model.df
[5]:
locus start end nbps nsnps genealogy inferred
0 0 0 100 100 5 ((r0:1.03561e+06,r3:1.03... NaN
1 1 0 100 100 10 ((r1:1.06273e+06,r2:1.06... NaN
2 2 0 100 100 20 ((r0:3.27727e+06,r1:3.27... NaN
3 3 0 100 100 6 ((r1:1.31174e+06,r2:1.31... NaN
4 4 0 100 100 9 ((r0:2.15347e+06,r1:2.15... NaN
5 5 0 100 100 3 ((r3:997060,r2:997060):2... NaN
6 6 0 100 100 7 ((r4:1.32174e+06,r3:1.32... NaN
7 7 0 100 100 11 (r2:3.59726e+06,(r3:1.25... NaN
8 8 0 100 100 27 (r3:1.59951e+07,(r1:2.86... NaN
9 9 0 100 100 6 ((r0:1.04082e+06,r2:1.04... NaN
10 10 0 100 100 16 ((r0:2.08482e+06,r3:2.08... NaN
11 11 0 100 100 21 (r1:8.04834e+06,(r0:1.41... NaN
12 12 0 100 100 16 (r3:4.1391e+06,(r1:2.815... NaN
13 13 0 100 100 10 (r1:2.38762e+06,(r0:1.88... NaN
14 14 0 100 100 26 ((r1:1.14487e+06,r2:1.14... NaN

Save the results table to disk

[6]:
model.df.to_csv("./sim-table.csv")

Filter based on stats

Select only the loci that contain >10 SNPs in the simulated sequences.

[7]:
model.df[model.df.nsnps > 10]
[7]:
locus start end nbps nsnps genealogy inferred
2 2 0 100 100 20 ((r0:3.27727e+06,r1:3.27... NaN
7 7 0 100 100 11 (r2:3.59726e+06,(r3:1.25... NaN
8 8 0 100 100 27 (r3:1.59951e+07,(r1:2.86... NaN
10 10 0 100 100 16 ((r0:2.08482e+06,r3:2.08... NaN
11 11 0 100 100 21 (r1:8.04834e+06,(r0:1.41... NaN
12 12 0 100 100 16 (r3:4.1391e+06,(r1:2.815... NaN
14 14 0 100 100 26 ((r1:1.14487e+06,r2:1.14... NaN

Write data for a subset of loci

The idxs argument can be used to subselect a list of locus indices to be written to file. This works for both .write_loci_to_phylip() (writing separate files for each locus), as well as for .write_seqs_to_phylip() (writing a concatenated sequence).

[8]:
# get index numbers of the selected rows
idxs = model.df[model.df.nsnps > 10].index
[9]:
# call the write command with the selected idxs
model.write_loci_to_phylip(outdir="./ipcoal-sims", idxs=idxs)
wrote 7 loci (5 x 100bp) to home/deren/Documents/physeqs/docs/notebooks/ipcoal-sims/[...].phy
[10]:
# look at the files that were written
! ls ipcoal-sims/
10.phy  11.phy  12.phy  14.phy  2.phy  7.phy  8.phy

Specify names for written loci

[11]:
# call the write command with the selected idxs
model.write_loci_to_phylip(
    outdir="./ipcoal-sims-named",
    idxs=idxs,
    name_prefix="ipcoal-",
    name_suffix="-sim",
)
wrote 7 loci (5 x 100bp) to home/deren/Documents/physeqs/docs/notebooks/ipcoal-sims-named/[...].phy
[12]:
# look at the files that were written with fancier names
! ls ipcoal-sims-named/
ipcoal-10-sim.phy  ipcoal-12-sim.phy  ipcoal-2-sim.phy  ipcoal-8-sim.phy
ipcoal-11-sim.phy  ipcoal-14-sim.phy  ipcoal-7-sim.phy