# load EVE results
EVEresTbl <- readRDS("data/runEVE/EVEresTbl.RDS")


# Load dN/dS results
dndsRes <- readRDS("input-data/dNdS_aBSREL/dndsRes.RDS")

dndsTbl <- 
  dndsRes %>% 
  # only complete duplicates
  filter(N3 %in% filter(EVEresTbl,dup_type == "duplicate", !N3partial)$N3) %>% 
  filter(N3 != "OG1v0014281.16") %>% # this has only one N9..
  filter( !is.na(N9)) %>% 
  filter( baseline_omega < 2) %>%
  select( N3, N9, baseline_omega, uncorrected_P) %>% 
  left_join(select(EVEresTbl %>% mutate( N9partial = is.na(Okis) | is.na(Omyk) | is.na(Salp) | is.na(Ssal)),
                   N9partial, N3partial,sigDir,dupSigDir,N9), by="N9")

dN/dS distribution per category

# dndsTbl %>% 
#   ggplot() + 
#   facet_grid( . ~ dupSigDir,scales = "free") + 
#   geom_boxplot(aes(x=sigDir,y=baseline_omega,
#                    fill=sub("^$","cons",ifelse(dupSigDir %in% c("up","down","both"), sigDir, dupSigDir)))) +
#   scale_fill_manual(values = c(down="#298ABD",downdown="#1B4F8B",
#                                up="#F62D31",upup="#8B1B1A", cons="grey")) +
#   # scale_y_log10() +
#   ylab("dN/dS") +
#   theme_classic() + theme(legend.position = "none")

dndsTbl %>% 
  mutate( sigDir = sub("^$","cons",sigDir)) %>% 
  mutate( dupSigDir = sub("^$","cons",dupSigDir)) %>% 
  mutate( dupSigDir = c(up="cons+up",down="cons+down",upup="up+up",downdown="down+down",
                        cons="cons+cons",both="down+up")[dupSigDir]) %>%
  ggplot() + 
  facet_grid( . ~ dupSigDir,scales = "free") + 
  # geom_violin(aes(x=sigDir,y=baseline_omega, 
  #                  fill=ifelse(dupSigDir %in% c("up+up","down+down"), dupSigDir, sigDir))) +
  geom_boxplot(aes(x=sigDir,y=baseline_omega, 
                   fill=ifelse(dupSigDir %in% c("up+up","down+down"), dupSigDir, sigDir))) +
  scale_fill_manual(values = c(down="#298ABD","down+down"="#1B4F8B",
                               up="#F62D31","up+up"="#8B1B1A", cons="grey")) +
  ylab("dN/dS") +
  scale_y_continuous(minor_breaks = seq(0, 2, by = 0.1)) +
  theme_bw() + theme(legend.position = "none",panel.grid.major.x = element_blank(), panel.grid.minor.y = element_line())

If we perform a paired wilcox test for the duplicates with different EVE results, we find that only the down vs cons have a significantly different

dndsPaired <-
  dndsTbl %>% add_count(N3) %>% filter(n==2) %>% 
  filter( dupSigDir %in% c("both","up","down")) %>% 
  select( N3, dupSigDir, sigDir, baseline_omega) %>% 
  mutate( sigDir = sub("^$","cons",sigDir)) %>% 
  spread( key=sigDir,value=baseline_omega)

with( filter(dndsPaired, dupSigDir=="both"),
      wilcox.test(down,up,paired=T))
## 
##  Wilcoxon signed rank test
## 
## data:  down and up
## V = 389, p-value = 0.1207
## alternative hypothesis: true location shift is not equal to 0
with( filter(dndsPaired, dupSigDir=="up"),
      wilcox.test(cons,up,paired=T))
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  cons and up
## V = 15045, p-value = 0.1054
## alternative hypothesis: true location shift is not equal to 0
with( filter(dndsPaired, dupSigDir=="down"),
      wilcox.test(cons,down,paired=T))
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  cons and down
## V = 99157, p-value = 3.893e-06
## alternative hypothesis: true location shift is not equal to 0

Cumulative proportion of dN/dS for one down dups

P-value based on paired wilcox test

wilcox_dNdS_any <- 
  filter(dndsPaired, dupSigDir=="down") %>% 
  with( wilcox.test(cons,down,paired=T, alternative="less")) 

pval_text <-
  wilcox_dNdS_any %>% 
  with( sprintf("p = %.2e",p.value) )

nShiftedTblComplete <- readRDS("data/TissueAtlasAnalysis/nShiftedTblComplete.RDS")

tmp <- 
  dndsTbl %>% 
  filter(N3partial==F) %>% 
  right_join( select(EVEresTbl,N9), by="N9") %>%
  filter(dupSigDir=="down") %>% 
  group_by(N3) %>%
  filter( n() == 2) %>%
  ungroup() %>% 
  left_join(select(nShiftedTblComplete,N3,nShifted),by="N3")

# # wilcoxon unpaired test for all tissues down vs any tissues cons
# pval_text2 <-
#   wilcox.test(filter(tmp, sigDir=="down", nShifted==15)$baseline_omega,
#             filter(tmp, sigDir=="")$baseline_omega) %>% 
#   with( sprintf("p = %.02f",p.value) )

# paired test all tissues down vs all tissues cons
wilcox_dNdS_all <-
  dndsPaired %>%
  left_join( select(nShiftedTblComplete,N3,nShifted), by="N3") %>%
  filter(nShifted==15) %>%
  filter(dupSigDir=="down") %>%
  with( wilcox.test(cons,down,paired=T, alternative="less")) 

pval_text2 <-
  wilcox_dNdS_all %>%
  with( sprintf("p = %.03f",p.value) )


bind_rows(
  .id = "shifted",
  all = filter(tmp, nShifted==15),
  any = tmp
) %>% 
  group_by(sigDir, shifted) %>% 
  arrange(baseline_omega) %>% 
  mutate( cumProp = seq(0,1,length.out = n())) %>% 
  ungroup() %>% 
  mutate( sigDir = ifelse(sigDir=="","cons","down")) %>% 
  filter( !(shifted=="all" & sigDir=="cons")) %>% 
  ggplot( ) + 
  geom_line(aes(x=baseline_omega,y=cumProp,color=sigDir,linetype=shifted)) +
  scale_color_manual(values=c(down="#298ABD",cons="grey"), name="") +
  theme_bw() + xlab("dN/dS") + ylab("Cumulative proportion") +
  scale_linetype_manual(values = c( any=1, all=2)) +
  annotate(geom = "text",x=0.5,y=0.65, hjust=0, color="#298ABD",
           label=paste0("down-shift\n",pval_text)) +
  annotate(geom = "text", x=0.3,y=0.4, hjust=0, color="#298ABD",
           label=paste0("down-shift, all tissues\n", pval_text2)) +
  coord_cartesian(xlim=c(0,1))+
  ggtitle("dN/dS ratio among one-down duplicates")

The P-value for all tissues is higher (less significant) even though the plot indicates that the dN/dS difference is greater than for any tissue. One explanation for this is the difference in number of trees in each set, i.e. N_all = 117 vs N_any = 711. Also the dN/dS of the conserved counterpart of the “all tissues” trees is not shown and it is a bit higher than for any tissues.