The Results Table (.df)¶
The results of a simulation are stored to the simulation object in a Pandas DataFrame accessible from the .df
attribute.
[1]:
import ipcoal
import toytree
Simulate data under a tree model¶
[2]:
tre = toytree.rtree.unittree(5, treeheight=1e6)
[3]:
model = ipcoal.Model(tree=tre, Ne=1e6, recomb=0.0)
[4]:
model.sim_loci(nloci=15, nsites=100)
View the results table¶
[5]:
model.df["inferred"] = "NaN"
model.df
[5]:
locus | start | end | nbps | nsnps | genealogy | inferred | |
---|---|---|---|---|---|---|---|
0 | 0 | 0 | 100 | 100 | 5 | ((r0:1.03561e+06,r3:1.03... | NaN |
1 | 1 | 0 | 100 | 100 | 10 | ((r1:1.06273e+06,r2:1.06... | NaN |
2 | 2 | 0 | 100 | 100 | 20 | ((r0:3.27727e+06,r1:3.27... | NaN |
3 | 3 | 0 | 100 | 100 | 6 | ((r1:1.31174e+06,r2:1.31... | NaN |
4 | 4 | 0 | 100 | 100 | 9 | ((r0:2.15347e+06,r1:2.15... | NaN |
5 | 5 | 0 | 100 | 100 | 3 | ((r3:997060,r2:997060):2... | NaN |
6 | 6 | 0 | 100 | 100 | 7 | ((r4:1.32174e+06,r3:1.32... | NaN |
7 | 7 | 0 | 100 | 100 | 11 | (r2:3.59726e+06,(r3:1.25... | NaN |
8 | 8 | 0 | 100 | 100 | 27 | (r3:1.59951e+07,(r1:2.86... | NaN |
9 | 9 | 0 | 100 | 100 | 6 | ((r0:1.04082e+06,r2:1.04... | NaN |
10 | 10 | 0 | 100 | 100 | 16 | ((r0:2.08482e+06,r3:2.08... | NaN |
11 | 11 | 0 | 100 | 100 | 21 | (r1:8.04834e+06,(r0:1.41... | NaN |
12 | 12 | 0 | 100 | 100 | 16 | (r3:4.1391e+06,(r1:2.815... | NaN |
13 | 13 | 0 | 100 | 100 | 10 | (r1:2.38762e+06,(r0:1.88... | NaN |
14 | 14 | 0 | 100 | 100 | 26 | ((r1:1.14487e+06,r2:1.14... | NaN |
Save the results table to disk¶
[6]:
model.df.to_csv("./sim-table.csv")
Filter based on stats¶
Select only the loci that contain >10 SNPs in the simulated sequences.
[7]:
model.df[model.df.nsnps > 10]
[7]:
locus | start | end | nbps | nsnps | genealogy | inferred | |
---|---|---|---|---|---|---|---|
2 | 2 | 0 | 100 | 100 | 20 | ((r0:3.27727e+06,r1:3.27... | NaN |
7 | 7 | 0 | 100 | 100 | 11 | (r2:3.59726e+06,(r3:1.25... | NaN |
8 | 8 | 0 | 100 | 100 | 27 | (r3:1.59951e+07,(r1:2.86... | NaN |
10 | 10 | 0 | 100 | 100 | 16 | ((r0:2.08482e+06,r3:2.08... | NaN |
11 | 11 | 0 | 100 | 100 | 21 | (r1:8.04834e+06,(r0:1.41... | NaN |
12 | 12 | 0 | 100 | 100 | 16 | (r3:4.1391e+06,(r1:2.815... | NaN |
14 | 14 | 0 | 100 | 100 | 26 | ((r1:1.14487e+06,r2:1.14... | NaN |
Write data for a subset of loci¶
The idxs
argument can be used to subselect a list of locus indices to be written to file. This works for both .write_loci_to_phylip()
(writing separate files for each locus), as well as for .write_seqs_to_phylip()
(writing a concatenated sequence).
[8]:
# get index numbers of the selected rows
idxs = model.df[model.df.nsnps > 10].index
[9]:
# call the write command with the selected idxs
model.write_loci_to_phylip(outdir="./ipcoal-sims", idxs=idxs)
wrote 7 loci (5 x 100bp) to home/deren/Documents/physeqs/docs/notebooks/ipcoal-sims/[...].phy
[10]:
# look at the files that were written
! ls ipcoal-sims/
10.phy 11.phy 12.phy 14.phy 2.phy 7.phy 8.phy
Specify names for written loci¶
[11]:
# call the write command with the selected idxs
model.write_loci_to_phylip(
outdir="./ipcoal-sims-named",
idxs=idxs,
name_prefix="ipcoal-",
name_suffix="-sim",
)
wrote 7 loci (5 x 100bp) to home/deren/Documents/physeqs/docs/notebooks/ipcoal-sims-named/[...].phy
[12]:
# look at the files that were written with fancier names
! ls ipcoal-sims-named/
ipcoal-10-sim.phy ipcoal-12-sim.phy ipcoal-2-sim.phy ipcoal-8-sim.phy
ipcoal-11-sim.phy ipcoal-14-sim.phy ipcoal-7-sim.phy