revision_letter_new.tex

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Professional Formal Letter
% LaTeX Template
% Version 2.0 (12/2/17)
%
% This template originates from:
% http://www.LaTeXTemplates.com
%
% Authors:
% Brian Moses
% Vel (vel@LaTeXTemplates.com)
%
% License:
% CC BY-NC-SA 3.0 (http://creativecommons.org/licenses/by-nc-sa/3.0/)
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%----------------------------------------------------------------------------------------
%	PACKAGES AND OTHER DOCUMENT CONFIGURATIONS
%----------------------------------------------------------------------------------------

\documentclass[12pt, a4paper]{letter} % Set the font size (10pt, 11pt and 12pt) and paper size (letterpaper, a4paper, etc)

\input{structure.tex} % Include the file that specifies the document structure

\newcommand{\bibsection}[1]{}
\newcommand{\section}[1]{}
\newcommand{\newblock}{}

\newenvironment{thebibliography}[1]%
      {References\begin{description}}{\end{description}}
   \newcommand{\htmlbibitem}[2]{\label{#2}\item[{[#1]}]}
\usepackage[authoryear]{natbib}

\newenvironment{reply}{$\triangleright$\bf}{$\triangleleft$}
\renewenvironment{quote}
               {\list{}{\rightmargin\leftmargin}%
                \item\relax\normalfont}
               {\endlist}

\setlength\parskip{\bigskipamount} \setlength\parindent{0pt}

%\longindentation=0pt % Un-commenting this line will push the closing "Sincerely," and date to the left of the page

%----------------------------------------------------------------------------------------
%	YOUR INFORMATION
%----------------------------------------------------------------------------------------


\Who{Dr. Luiz Max de Carvalho} % Your name

\Title{, PhD} % Your title, leave blank for no title
\authordetails{
	School of Applied Mathematics\\ % Your department/institution
	Praia de Botafogo, 190\\ % Your address
	Rio de Janeiro, RJ, 22250-900\\ % Your city, zip code, country, etc
	Email: lmax.fgv@gmail.com \\ % Your email address
	Phone: +55 21 3799-2348 \\ % Your phone number
% 	URL: LaTeXTemplates.com % Your URL
}

%----------------------------------------------------------------------------------------
%	HEADER CONTENTS
%----------------------------------------------------------------------------------------
\logo{emap.png}
% \logo{Marca_FGV_EMAp_colorida.png}

\headerlinetwo{Getúlio Vargas Foundation (FGV)} % Top header line, leave blank if you only want the bottom line

% \headerlinetwo{School of Applied Mathematics (EMAp)} % Bottom header line

%----------------------------------------------------------------------------------------

\begin{document}

\input{FMDV_AMERICA.xtr} 

\bibliographystyle{mbe}

\begin{letter}{
	Dr. Santiago F. Elena\\
    Editor-in-Chief \\
    Virus Evolution
}

%----------------------------------------------------------------------------------------
%	LETTER CONTENT
%----------------------------------------------------------------------------------------

\opening{Dear Dr.~Elena,}

I would like to submit the manuscript entitled ``Spatio-temporal Dynamics of Foot-and-Mouth Disease Virus in South America" for consideration for publication in \textit{Virus Evolution}.

This manuscript was originally submitted to~\textit{Virus Evolution} in 2015 and was assigned manuscript ID VEVOLU-2015-010. 
After the first round of revisions, I was unfortunately unable to perform the extensive modifications requested by the referees.
After concluding my PhD in 2019 I was able to return to this project and re-collect and re-analyse the data using a better analytical framework that accounts for collection (sampling) bias explicitly, including the techniques in~\cite{Karcher2020} .

It is  our understanding that the manuscript brings a substantial methodological advance that sheds light into the dynamics of an important livestock virus at a continental level -- previous analyses have been restricted to single countries (e.g. Ecuador and Argentina) or regions (e.g. Andes).
Thus, we consider the manuscript to be of interest to the readership of~\textit{Virus Evolution} both from a methodological and an applied point of view.

As an appendix to this letter I provide a point-by-point response to each point by each reviewer clarifying the improvements made.
In providing our revision, we have carefully considered the helpful suggestions and critiques of yourself and three Reviewers.
You will find a point-by-point response (bold) to all comments (normal text) we received.
Significant changes to the manuscript find themselves in quotes.

\closing{Sincerely,}

\clearpage

%====================
\textbf{Editor-in-Chief}
%====================

Dear Mr. Carvalho,

Manuscript ID VEVOLU-2015-010 entitled ``Spatio-temporal Dynamics of Foot-and-Mouth Disease Virus in South America'' which you submitted to Virus Evolution, has been reviewed.  
The comments of the reviewer(s) are included at the foot of this letter.

The three reviewers have very diverse opinions, from rejection (2nd reviewer) to minor revisions (3rd reviewer), thus the decision cannot be other but to request you to perform a major revision.  
Therefore, I invite you to respond to the reviewer(s)' comments and revise your manuscript.  
Please notice that given the nature of these comments and of the required amount of modifications, the new version will be send out for a second round of review, most likely to the same reviewers.

Once again, thank you for submitting your manuscript to Virus Evolution and I look forward to receiving your revision.

Sincerely,
Dr. Santiago Elena
Editor-in-Chief
Virus Evolution

\begin{reply}
We thank the Editor-in-Chief and agree that the Reviewers' comments have helped us improve our manuscript. 
As the referees shared some of the same concerns, we first address those in general comments.
\end{reply}


\textbf{General comment \#1: New data}

\begin{reply}
A major criticism of our original submission was that we did not use all of the publicly available data.
To address this we conducted a thorough search of GenBank, now described in the Methods section of the paper:
\begin{quote}
We retrieved all FMDV nucleotide sequences available from GenBank~\citep{Benson2013} from the National Center for Biotechnology Information (NCBI, \url{ http://www.ncbi.nlm.nih.gov/}) with more than $600$ bp.
This first step yielded $6, 907$ sequences which were then filtered to exclude all sequences that did not include the 1D (VP1) gene, resulting in $4, 507$ sequences being kept.
We then filtered for sequences from serotypes A and O, yielding $1051$ and $2350$ sequences, respectively.
Next, we excluded sequences that had been extensively passaged in cell culture and selected all sequences from South America (Argentina, Bolivia, Brazil, Colombia, Ecuador, Paraguay, Peru, Uruguay, Venezuela) for which information on country and year of isolation was available.
\end{quote}
This procedure lead to $53$ additional sequences for serotype A and $43$ sequences for serotype O, compared to our original submission. 
\end{reply}


\textbf{General comment \#2: Accommodating temporal and spatial sampling bias}

\begin{reply}
Even with broader sampling, the use of observational data brings with it the concern that temporal and spatial sampling bias might lead to incorrect inferences.
We address this important concern by employing analytical methods that explicitly accommodate the possibility of sampling bias.

On the temporal side, we employ the modelling framework of~\cite{Karcher2020} to fit various coalescent-based models that specify the dependence of the sampling process on the population size or other temporal factors, while also accounting for phylogenetic uncertainty.
Using marginal likelihood estimation, we can compare these models to one another and infer not only whether there is significant temporal bias but also possible explanatory factors.

To account for spatial bias, we employed a general linear model (GLM)~\citep{Lemey2014,Dudas2017} that allows for several predictors of spatial spread to be included simultaneously.
This has the benefit of allowing one to construct predictors that account for possible sampling bias, such as the difference in numbers of sequences between locations and also the numbers of sequences at both origin and destination.
By assessing the posterior inclusion probability of these sampling bias proxies, we can assert whether they contribute significantly to estimated dispersal rates.
This framework also allows consideration of further predictors~\textit{in addition to} the sampling bias ``controls''.

In summary, we have updated the statistical methods in the paper so as to accommodate and test for sampling bias.
It should be noted, however, that these methods are not a silver bullet; a biased sample will always be a biased sample and inferences will be affected regardless.
The use of principled statistical methods helps mitigate the bias and uncover the true patterns in the data.
\end{reply}

%====================
\textbf{Reviewer \#1}
%====================

The manuscript by Carvalho et al., describes the analysis of foot-and-mouth disease viruses (FMDVs) of two different serotypes (O and A) based on partial genome sequences from samples collected during a 55 year period for serotype A and 16 years for serotype O. 
The sequence analysis data is linked to studies on the trade of FMDV susceptible animals, e.g. cattle and pigs. 
The study has some interest but from my point of view, there seem to be some surprising omissions of information which may, or may not, affect the conclusions that can be drawn.

Specific points

1)      In the Abstract, the text indicates that serotype O emerged (in South America) in around 1990. 
This is an odd statement to make since there are well known FMDV serotype O strains that predate this, e.g. O11 Campos (from Brazil in 1958), O1 Argentina (c. 1965) and O/M11/MEX from Mexico in 1952 which have all been sequenced in part and accession numbers are available (cited in Wright et al., (2013) Infect Genetics Evolution 20, 230-238).  
How does consideration of such sequences influence the information about the date of introduction of the viruses and their circulation in South America? 
In the Introduction, the authors indicate that: ``Historically, serotype O has been the most prevalent serotype on the continent''? (lines 53-55, P1). 
The authors need to explain why the earlier strains of FMD virus were not included in their analyses.

\begin{reply}
We thank the reviewer for catching this.
Firstly, the Mexico sample(s) was not included because we chose to restrict attention to South America.
Secondly, we updated our data sets to include many more sequences.
Please see General Comment \#1 for more information.
\end{reply}

2)      It is curious that there is no mention of serotype C FMDV in South America.

\begin{reply}
Serotype C was indeed mentioned in the introduction (line 55): ``Serotype C on the other hand was last encountered in the continent in 1995 in Brazil'' .
\end{reply}

3)      The sequence analysis is based on the VP1 coding region, this only represents about 630 nt out of a complete FMDV genome of about 8400nt. 
This information is not presented within the Introduction or Results section of the manuscript and only becomes apparent from Figure legends and Material \& Methods (i.e. it is quite well hidden). 
The resolution of the analyses based on VP1 coding region sequence information alone is necessarily limited. 
Full genome (or near full genome) sequencing can give much higher resolution, e.g. to the level of farm-to-farm spread (e.g. see Valdazo-Gonzalez et al., (2012) PLosOne 7(11) e49650).

\begin{reply}
The fact that we use VP1 sequences is explicit in the methodology and all of our data and code are publicly available.
We chose VP1 because it is routinely used in molecular epidemiology studies of FMDV, and by far the most abundant gene in terms of numbers of sequence on GenBank.
In any case, we have now added a whole section at the end of the paper called ``Limitations of this study'' to address concerns of partial versus full genomes, which we agree is an important caveat.
We have also included a citation of Valdazo-Gonzalez et al. (2012) in our discussion of full genomes~\textit{versus} partial sequences.
\end{reply}
%GB: should we elaborate (here) more on the reason(s) behind our choice for VP1? LM: Done.

4) The nature of the samples used for the virus sequence determination is also not indicated, have some been extensively passaged in cell culture? 
This information should be available using the accession numbers of the published sequences and clearly can influence the outcome of sequence comparisons.

\begin{reply}
We excluded sequences coming from samples that had been extensively passaged. 
The Supplementary Material includes a list of all sequences used, as well as those excluded based on this and other criteria. 
\end{reply}

5) I am not familiar with the ``root-to-tip'' plots shown in Figure S1 but it seems to me that the slope of the line for serotype A over the period from 1995 to 2010 is not very different from that of serotype O over this limited time period and is rather different from that for the whole time period for serotype A. 
It would be useful if the authors commented.

\begin{reply}
The slope in a root-to-tip regression is a rough estimate of the evolutionary rate, and in this case both serotypes have markedly different evolutionary rates.
At any rate, since the data sets have substantially changed, so have the plots.
\end{reply}

6)      A major concern about identifying origins of samples is having adequate coverage of samples from potential sources. 
It is not entirely clear to me that the coverage of FMDV strains circulating in South America is sufficient to be able to draw good conclusions. 
It may be that it is but this is not clearly demonstrated.

\begin{reply}
Please see General comment \#1.
\end{reply}

7)      On P.4, lines 48-50. The text indicates that the importance of long range migration routes seems to differ for the two serotypes but the confidence values for the two serotypes seem to overlap extensively, so is this a real difference?

\begin{reply}
We thank the reviewer for catching this.
We have now removed this analysis as it did not address the epidemiological question appropriately.
We now limit our discussion to qualitative observations of well-supported (BF $>3$) long-range migrations for both serotypes without attempting a quantification. 
\end{reply}

8)      Recombination within the coding region for VP1 alone is rather rare and thus I am not sure the check for recombination was very justified or useful (P. 9 lines 8-11).

\begin{reply}
We have removed this analysis.
\end{reply}


%====================
\textbf{Reviewer \#2}
%====================

Comments to the Author
The ``Spatio-temporal dynamics of FMDV in South America'' study by Carvalho et al. describes the historical spatio-temporal dispersal of FMDV across the South America continent reconstructed using phylogeography analyses employed through a Bayesian spatial diffusion model. 
The reported results define transmission networks and transmission hubs (at country level) that would explain the historical spread of FMD within the continent. 
In addition, the authors deal with variables likely associated with the FMD spread (i.e. geographical distance, livestock density, and livestock trade) in trying to explain their causative effect associated with historical FMD outbreaks. 
As last attempt, the authors correlated the demographic dynamics of FMDV with reported number of FMD outbreak and vaccination coverage in trying to assess the impact of FMD control policies on the FMDV diversity and its population expansion/contraction through time. 
The paper has been already published as an arXiv (http://arxiv.org/abs/1505.01105) the 5th of May. 
Although the methodological approach has been previously used in different setting and the results would be interesting for a computational basis, there are several aspects of the study that need to be carefully considered before the paper would be suitable for publication. 
In fact, the presence of bias in the data used is potentially producing an incorrect picture of FMD in South America. 
In addition, although the authors are examining the potential impact of the sampling bias in the sequence data analysed, this is only properly discussed as Supplementary Text and not in the main paper, where they assume the results as correct, valid and without bias. 
One major problem of the study is the data. 
They claim to have analysed all the data publicly available in GenBank but (as detailed below) this is not correct and the analyses should be repeated including the full sequence dataset. 
I would be, therefore, really cautious to draw important conclusion from this study given the issues reported, which might hold true only for the time-frame pictured from the data you have analysed. 
As already said, this study need a proper revision before being published and this main revision would involve the re-analysis of all the data adding all the sequences available in GenBank and which are not included in this version.

\begin{reply}
We thank the reviewer for such a through assessment of our work.
It seems the reviewer's main concerns were (i) incomplete sampling of available data; (ii) sampling bias (even in face of all available data) and (iii) the level of generality of the results in face of (i) and (ii).
We have now re-done the data collection and improved the data sets we analyse, adding many more sequences. %GB: improved in what way? LM: more sequences, basically.
We also employ better models that account for sampling bias both in time and space.
For more elaborate information, please see tje responses below and also General Comments \#1 and \#2 above.
\end{reply}

Problems of the study:
-       The authors claim to have used all publicly available VP1 sequences from GenBank, but after inspection this is not true. 
In fact, there are quite a number of sequences that has not been included in the study. If this has been done intentionally, the reason for this decision should be discussed in the paper; if not, I strongly suggest checking better in GenBank what is missing from your analyses (you have even the GenBank Accession Nos of the missing data in one of your reference [Malirat et al., 2007]). 
For the serotype O, for example, you are totally ``ignoring'' the sequences before the 1994 but, what about the O/Campos (O/Br/58) and the Argentinian samples of the 82-83 or the Caseros/67, the Selab/77? I could make the list much longer. 
This is valid for the serotype A as well (e.g. among others, where is the A10/Arg/61?). 
Therefore, if you want to claim to have analysed the complete VP1 coding sequence data for South America you need to re-perform all the analyses, because the results might provide you a completely different picture of FMD in South America (see your conclusion on the Colombian origin of the type O in South America).

\begin{reply}
We thank the reviewer for their careful assessment of our data.
It is indeed true we had not included many sequences that could otherwise have been analysed.
This has now changed and we have collected $53$ additional sequences for serotype A and $43$ sequences for serotype O (See General Comment \#1).
\end{reply}

-       All the results discussed on the spatial diffusion of FMDV in the whole South America continent should be treated with cautions (potentially providing you incorrect data), considering that you have: missing information from missing sequences; a sampling bias in your data according to time and country. 
It is well known that FMDV was introduced (as you pointed out in the introduction) by human migration from Europe in the end of the 19th Century with early reports in Argentina between 1860 and 1870, and 1895 in Brazil. 
There were at least two distinct introductions in the North and one in the South but, before the 1922, it is really difficult to say which serotype was (i.e. before the FMDV typing was performed). 
However, it is clear that the early spread of FMD was coming from the South. 
Argentina at the time was one of the main export hubs of livestock to the continent and even the Mexico outbreak in 1922 has been attributed by the introduction of infected cattle from Argentina. 
This holds true for: Chile, 1920 outbreak (decline of cattle industry during 1912 in Chile with large introduction from Argentina); official report in Venezuela 1950 (potential from importation of Argentinian meats/livestock back to the 1947). 
It might worth to know that the Andes acted as a barrier for taking FMD out of Chile and the western part of South America, until when the regional animal movements and trade primarily caused the spread. 
The countries of the Rio de la Plata, which were sharing the Pampas ecosystem, experienced an early wave of disease spread and by the 1920 FMD was in Uruguay, Paraguay and Brazil. 
Historical data suggest that type O was introduced most likely from the South (maybe Argentina) and type A introduced from Europe. 
From your analysis the initial historical FMD wave has not been characterised (in the years before 1994 for type A; very limited and potentially biased before 1965 for type O). 
The only part which might be more realistic is the FMD transboundary movements within the ``countries triangle'' of Colombia, Ecuador and Venezuela, that could sounds more like from Argentina-Venezuela-Colombia-Ecuador, even though you have the effect of sampling bias that needs to be discussed.

\begin{reply}
We share the reviewer's concerns that sampling bias might be an important factor in our analyses.
This has been addressed using state-of-the-art phylodynamic methods which accommodate both temporal and spatial sampling bias.
While not a panacea, these methods are the best one can do in face of incomplete and potentially biased data.
For more information, please see General Comment \#2.
\end{reply}

-       Although you presented some data on sampling bias in your Supplementary Text (but this might be not satisfactory enough given the problem in the dataset), there is a real problem of sampling bias and this need to be addressed in your main text as well. 
How does the model deal with missing links? 
Is the prediction robust enough to provide a clear indication of virus spread in such a large geographical range? 
This could be a serious limitation (and problem) of your study. For serotype A, you analysed 131 VP1 sequences of which 44\% are from Argentina (of which $\sim$70\% are from 2000 and 2001) and 21\% from Venezuela (of which $\sim$86\% are recent samples - after 2001). 
In addition, the majority of your oldest samples are only from Brazil and Argentina. For serotype O, you have 167 sequences in total of which $\sim$54\% are from Ecuador (all after the 2002 and have been previously analysed - along with 30 sequences that have been included in this manuscript as well). 
Therefore, you have 90+30=120 sequences already analysed in a previous paper. 
Among the other, 36 sequences from Colombia ($\sim$22\% of the total) are barely covering the 2000s (as you claim 1994 to 2008), since you have 5 sequences from 2000, 1 from 2002 and 2 from 2008, a gap of 6 year. 
For the type A database, your oldest samples are only from Colombia. You attempted a random sub-sampling that, as far as I understood, have not taken into account the time of sampling, but just the quantity of data from each country. 
Maybe you need to account for time in your sub-sampling.

\begin{reply}
We now consider a model that explicitly accounts for temporal biases in the sampling process. 
\end{reply}

-       When doing analysis on sequences extracted from GenBank a detailed list of the sequences with their GenBank Accession No (along with associated metadata) should always be provided. 
Although a webpage (but this is difficult to check and, probably, the majority of the readers would not bother to access to your website) has been set up for the paper there are no GenBank references for the serotype A. 
This information should be included either as a table in the main text or as a S3.

\begin{reply}
This list is now available in Supplementary Material.
\end{reply}

-       This paper has been already published as an arXiv (http://arxiv.org/abs/1505.01105) which has been submitted the 5th of May and updated the 2nd of June (I received this review the 12th of June). 
Although it is common for theoretical maths, physics and computer sciences studies to be published as arXiv before being properly peer-reviewed, this is not the case with study dealing with topics as in this case. 
Since the study needs a substantial review and re-analysis of the data, I would suggest to withdraw your submission to arXiv

\begin{reply}
Our submission of the initial version of our manuscript to arXiv does not pose any conflict with the journal's policy.
Additionally, arXiv and bioRxiv are well known at this point to host pre-prints that have not yet been peer-reviewed, a practice that is commonly accepted in our field of research (as can be seen from the many pre-prints circulating without peer review on SARS-CoV-2).
\end{reply}

Major Comments:
-       Page 1 Line 36: ``Our dating''. 
This result is only compatible with your dataset and must not be related with the incursion of FMD in South America. 
Your dating is referred to the MRCAs of the data you have analysed but not the MRCAs of both the type A and O clades in South America. 
As already detailed the occurrences of both serotypes are much earlier than your estimates, which therefore are misleading in the description. 
If you comment on the South America FMD phylogenetic history, you need to do that only in line with the data you analysed and not as a general picture.

\begin{reply}
Whilst the occurrence of both serotypes might have occurred much earlier than the estimates we obtain, the estimates are for the origin of the~\textit{circulating} strains.
It is entirely possible FMDV has been introduced several times in South America.
\end{reply}

-       Page 2 Line 7: ``By the 1970s''. 
Again this is not true. 
During 1950s FMD was already causing problems in Argentina, Brazil, Chile, Peru, Uruguay, Venezuela, Colombia, and Ecuador. 
This picture is larger than a regional scale.

\begin{reply}
``Causing problems'' is not the same as having widespread epidemics.
See the reference we give~\citep{Saraiva2003} for more details.
\end{reply}

-       Page 2 Line 33: ``using all''. 
You are not using all the sequences available in GenBank. 
For example and as already commented, I cannot find the O Campos in your fasta file (and this is only one). 
I strongly suggest doing a better search and re-perform all the analyses.

\begin{reply}
This has now been done.
\end{reply}

-       Page 3 Line 20: ``the time of the most recent''. 
You need to clearly state here that these estimates hold true only for your sequences analysed (MRCSs of the data) and not of the entire South America because saying that is misleading and incorrect.

\begin{reply}
We have now clarified that the estimates as presented pertain to the data set at hand.
\end{reply}

-       Page 3 Line 21: ``indicating a more recent origin''. 
Again this is only true for your data and should be stated. 
Serotype O outbreaks have been reported in South America since the initial wave, but clear reports start from 1940-50 (e.g. massive outbreak in Peru in 1962; 1950 official report in Venezuela; 1957 A, O and C in the entire Rio Grande do Sul).

\begin{reply}
Again, this is the origin of the circulating strains, which is all that can be said~\textbf{from any particular sequence data set}.
\end{reply}

-       Page 3 Line 22: the results show a faster clock for the type O than the A (even this is not really a lot faster - considering the VP1 only there is a difference between the two of $\sim$4nt changes per year). 
Might this be due to the different molecular clock model used for type A and O?

\begin{reply}
The difference is not due to the choice of model because we considered the same set of molecular clock models for both serotypes, selecting those that provided the best fit for each.
Rate estimates are, however, widely consistent across models, for each serotype.
\end{reply}

-       Page 4 Line 49: ``Remarkably''. 
You claim that there is a difference between long-range migration routes between serotypes (reported as 0.14 and 0.05 for type A and O, respectively). 
However, the 95\% intervals are really similar and both containing the zero, I should then say that this is not so remarkable.

\begin{reply}
This analysis has now been excluded from the manuscript.
\end{reply}

-       Page 4 Line 57: ``The most probable''. 
This result might indicate that the 2001 reappearance of FMDV in Argentina was a persistent virus foci (maybe carrier?) or maybe some missing links (i.e. sampling bias) exists in your data which are not including contemporary isolates from neighbouring countries (besides Brazil) and, therefore, this would impact on your results. 
Since this would be quite an interest topic (even though only on a retrospective line), you need to discuss this in more details, assessing as well the validity of your results.

\begin{reply}
Our GLM analyses attempt to account for sampling bias, and find only mild evidence of bias (BFs $<3$ indicate weak support, see Figure 5 in the revised manuscript). 
We have now expanded the discussion of these issues a little more in the revised manuscript.
\end{reply}

-       Page 5 results on the Venezuelan origin of Andean FMDV spread: 
Considering that you have a bias in your sequences for type A, you need to really consider with caution your results and discuss more about the impact it might cause.

\begin{reply}
The question of bias has now been addressed more thoroughly (see General Comment \#2).
\end{reply}

-       Page 5 Line 16: ``Similar to what was found for Venezuela''. 
You present data on Colombia saying that this results is similar to type A for Venezuela? however, I would rather imagine that you are discussing about source of type O in the north from Colombia. 
Is this true? If so, please rewrite the sentence to make that clear. 
In addition, since you have all the oldest historical samples of type O from Colombia, the logical reasoning would be that of course the analysis point to Colombia as the main transmission hub. 
But, what about the sequences you have not included in the analysis? 
Should this provide you a different picture? 
You need to clearly discuss this issue (and of course re-perform the analysed including all the data available)

\begin{reply}
The discussion of the spatial origins results has now been completely re-written, among other reasons to accommodate the new sequences being analysed.
While common sense would dictate that the location with the oldest samples would be inferred as root, this is simply not true for the CTMC model we employ: the root state can potentially be any of the sampled states (countries).
\end{reply}

-       Page 5 Lines 34-38 and following paragraph (Lines 41-57): ``or serotype A''. 
This sentence is really confusing and need to be better formulated. 
For type A you find that geographical distance drives the diffusion, whilst this has a higher statistical support for type O but not like the cattle exchange. 
Now, the question is, what cattle exchange means? 
This implies geographic distance as well (because you are defining trade between countries, which in South America are not so very close), isn't it? 
So, the geographic distance is the main effect of FMDV diffusion or a confounding effect for cattle trade? 
I am really struggling to find a logic behind this results (or its analytical approach) considering that you have a strong bias in your data (both spatially and temporally) and you are analysing the geographical distance and trade (both cattle and swine) variables separately? have you checked for multicollinearity?

\begin{reply}
The confusing sentence has now been re-written.
We now employ a GLM approach to modelling the factors associated with spread, which allows us to account for multicollinearity and sampling bias simultaneously.
\end{reply}

-       Page 6 Line 9: Sensitivity analysis. 
The sub-samples (as referred in table S4 and S5) is excluding the over-represented countries, i.e. Argentina and Colombia. 
However, if you exclude Argentina from the type A data, you have now Venezuela that is over-represented (the same holds true for type O, for which Colombia is the over-represented after the exclusion of Ecuador). 
Since both the Argentinian and Ecuador samples are, let's say, monophyletic (collected for the majority within epidemics), this are not really changing the global picture. 
In addition, you claim (Page 24 Line 37) that removing Argentina from the type A analysis move the MRCA estimate of $\sim$6 years. Is this because you eliminate one of the oldest sequences present in your data, thus leaving only the Brazil '58 and, therefore, introduce a more substantial sampling bias/uncertainty? 
For type O, considering that the oldest samples are from Colombia, removing Ecuador has no impact in the results. 
I am getting confused to understand which methodology is behind your random sampling approach used for the 5 sub-sampling. Is this a proportional random sampling (with a temporal sub-sampling as well) of each country?

\begin{reply}
The sub-sampling analyses have now been removed from the manuscript.
The reasons for this are two-fold: first, subsampling has the unfortunate property of exploding combinatorially in the number of strata one wants to consider.
Secondly, it would be hard to concatenate results from several hundreds of replicates in order to assess whether sampling bias had a role.
Instead, we now adopt a modelling framework that accounts for temporal and spatial sampling bias directly, in a model-based fashion.
\end{reply}

-       Page 6 Demographic reconstruction: You are discussing the increase/decrease in FMDV diversity according to the reported activity of FMD and the control policies (i.e. vaccination) imposed. 
However, it seems that it is difficult to correlated like-with-like in your graph(s): you have doses/head of vaccine (this could be monovalent, bi-, tri-; strain(s) used), no of FMD cases (I suppose reported no of outbreak - this could be 1 individual of 1000s of animals infected) and viral diversity. 
One point that is completely missed in your discussion is the vaccine efficacy and this might impact in your analysis (i.e. some reports of drop in efficacy of the O campos vaccine). 
In addition, you comment that after 2001 an increase in vaccine doses resulted in a decrease in viral diversity: although from the FMD outbreak data is true for type A, it is not clearly valid for type O, which maintained a more stable trend (of course with some fluctuations). 
Is this, again, an issue due to bias in your data? A previous study (de Silva et al., 2012) describes how BSP incorrectly reconstructed a decrease in the last part of a datum epidemic when the population was still growing. 
This problem was related to the lack of genealogical information at later times. Would this be the case for your analysis as well?

\begin{reply}
We thank the reviewer for this astute observation.
The preferential sampling models considered in the revised manuscript do show that the corrected $N_e(t)$ plots (Figure 2, right panel) are somewhat different from naive estimates.
\end{reply}

-       Page 7 Line 31: ``the inclusion of archival''. 
You are discussing about the impact of using an outgroup into your analysis and fail to analyse that (although you could easily extract some sequences from GenBank). 
In addition, you claim that the type O was circulating in the continent with its root in Colombia but you do not include any samples prior to the 1994. 
This analysis is incorrect and should be appropriately revised.

\begin{reply}
This issue has already been addressed in previous responses.
\end{reply}

-       Page 7 Line 55: ``Previous studies''. 
It seems that the study you referred describes similar cycle of 4-5 years for both serotypes and, moreover, this would be really complicated and dangerous to apply as a general rule (since it is a country-based estimate). 
I would suggest deleting this sentence. In addition, from you skyride plot it is difficult to say that a 4-5 year FMDV cycles exists.

\begin{reply}
Done.
\end{reply}

-       Page 8 Line 8: ``The diversity bottleneck''. 
You commented about the bottleneck for the type A diversity reconstruction as an effect of FMD epidemics affecting several countries after the 2000, but I would remind you that the majority of your samples (collected mainly from epidemics) are from the 2000 afterwards. 
Therefore, this again would be a confounding effect due to bias in your data (i.e. is the skyride estimate affected by the number of coalescent events in your phylogeny?).

\begin{reply}
The new preferential sampling models should accommodate this (see Figure 2 in the revised manuscript).
\end{reply}

-       Page 8 Lines 18-21: ``Our results suggest''. 
This is incorrect and needs to be properly assessed when a more comprehensive analysis, which would include all the type O isolates, has been performed. 
If the paper of Carvalho et al., 2013, indicates the same results (i.e. describing the origin of the FMDV serotype O in South America from Colombia using the very same data), that needs a proper review as well.

\begin{reply}
It is not incorrect to say that our results suggest a particular inference.
Ultimately, there is a limit to what can be said from limited observational data; we do our best to accommodate potential biases in the sequence sampling.
\end{reply}

-       Page 8 Line 26: ``viral effective size''. 
This is a reminder about the previous comment on Demographic reconstruction.

\begin{reply}
Acknowledged.
\end{reply}

-       I am not familiar with the methodology behind that, but I suppose that the analysis of epidemiological predictors (i.e. cattle and pig trade/livestock data) seems to have been constructed around an ``average'' value which potentially does not describe the space-time trends of trade routes and animal movements. 
Does this have an impact on your results generated? 
If so, what is the validity of those results? Please, comment on this.

\begin{reply}
We now employ a general(ised) linear model (GLM) approach that allows us to break up trade into temporal chunks as well as include all predictors at once. 
\end{reply}

-       You might consider using some sequences (e.g. O BFS 1860) as outgroup for your phylogenetic reconstruction and perform a more detailed analysis that would include all your sequences (maybe using a random local clock to account for variability in the rates), therefore shaping the entire tree topology. 
In addition, the phylogenetic trees, as are presented now, are confusing and really difficult to read.

\begin{reply}
The analyses presented here pertain to rooted time-trees. 
As such, the suggestion of using an outgroup does not apply.
\end{reply}

Minor Comments:
-       Page 1 Line 34: ``environmental''. 
Are you really using environmental data (e.g. air and bathing water quality)? 
Or do you mean livestock population and trade data, so more epidemiologically-related data or population data?

\begin{reply}
We thank the reviewer for this comment.
We now  refer to the data collected for this paper as epidemiological and populational, excluding ``environmental''.
\end{reply}

-       Page 1 Line 42: ``Our findings''. 
This is a general sentence and might lead to the assumption that evolutionary and spatial dynamics of serotype A and O are globally different (which might be not the case). 
Just highlight the South America setting.

\begin{reply}
The sentence has been re-written.
\end{reply}

-       Page 1 Line 50: ``, the most important''. 
Is FMD the most important animal disease or, better, is one of the most?

\begin{reply}
It seems to be the case, specially for countries such as Brazil and Uruguay which depend on meat exports.
\end{reply}

-       Page 2 Line 17: References 4 and 14 are duplicates.

\begin{reply}
We thank the reviewer for catching this.
\end{reply}

-       Page 2 Line 17: Use Di Nardo et al. [12].

\begin{reply}
Done.
\end{reply}

-       Page 2 Line 18 and Line 20: ``environmental''. 
Check the meaning of ``environmental data'' with what you are trying to analyse.

\begin{reply}
Done.
\end{reply}

-       Page 2 Line 28: ``in the continent''. 
Which one? 
Do you mean at ``continental'' level? 
Or you are only referring to South America?

\begin{reply}
We are referring to South America, which is a continent so both statements would be correct.
\end{reply}

-       Page 3 Line 48: ``..we employ an asymmetric''. 
You have already detailed your analysis procedures in the Material and Methods section. 
This could be deleted.

\begin{reply}
Done.
\end{reply}

-       Page 5 Line 12: ``We provide evidence of Venezuela..'' 
Which region you are referring to? 
The Andean region? 
Please, specify.

\begin{reply}
Addressed.
\end{reply}

-       Page 5 Line 26: ``trade and viral diffusion, we collected''. 
I think it would be better to say ``we obtained data from'' because maybe you have not been in the field collecting data.

\begin{reply}
Corrected. %GB: I would just make the reviewer happy here, and say: corrected. LM: grrr, done begrudgingly
\end{reply}

-       Page 7 Line 22: I would rather use: ``see Figure 3 in [6]'' or ``see FMD historical outbreak data in [6]''.

\begin{reply}
Re-written.
\end{reply}

-       Page 8 Line 7: ``..for viral Ne in both''. 
You previously discuss about viral diversity and now present the effective population size (Ne). 
Since the skyline family is based on the ?=Net, you should describe Ne only if you have extracted that estimate with the appropriate measure of generation time, which you are not having or even discussing (i.e. in your graph you should report in your y-axis legend the compound value as well - Net). 
I suggest using viral diversity throughout the paper.

\begin{reply}
This has now been resolved. Thank you. %GB: simply replace in the text? Or avoid the use of Ne altogether and simply mention 'effective population size'? LM: think it's not a problem anymore.
\end{reply}

-       Page 9 Line 13: It seems you used BEAST 1.7.5 - or even and older 1.7.2 version - (from your web available .xml) to perform the analyses, although a more recent version is available (1.8.2). 
Is the latest version more robust and efficient in the results generated? 
Have you re-analysed your data using the latest version and producing similar results?

\begin{reply}
All analyses have now been conducted with the latest stable version of BEAST (v1.10). 
\end{reply}

-       You have a type O sequence from Peru (GenBank Accession No. HQ695844.1) you say collected in 2004, whilst in your previous paper on Ecuador is defined as 1994. 
I checked in GenBank and this is from 2004. 
Therefore, you need to amend your previous paper with the correct date.

\begin{reply}
We thank the reviewer for catching this.
\end{reply}

%====================
\textbf{Reviewer \#3}
%====================

Comments to the Author: Your reviewer is not an expert in phylogeographic or phylodynamic analyses, but does have experience in Bayesian analyses in general and FMDV epidemiology and evolution.

The authors present an analysis of the spatio-temporal dynamics of FMDV in South America. 
They conclude that serotypes O and A behave quite differently, with different rates of evolution, different circulation networks, and different predictors of spread. 
Their analysis is based on $\sim$300 VP1 sequences retrieved from public databases, and combines a series of Bayesian phylogenetic/geographic/dynamic analyses to come to its conclusions.

It is unfortunate that there is such limited sequence data to work with, especially since it is only for VP1, a very small component of the FMDV genome (albeit a very variable one), but I am concerned more by other problems with the underlying data on which it is based, and the soundness of the resulting conclusions. 
Assuming these concerns can be addressed, however, the paper provides new conclusions on the spread of the disease in South America, will be of interest and value to the FMD research community, and should be published.

\begin{reply}
We thank the reviewer for their assessment.
Some of the concerns raised have already been addressed in previous responses.
\end{reply}

This would be especially true is there were any evidence in terms of the known epidemiology of the disease which might support the assertion that it differs so strikingly between the serotypes, such as a differential likelihood in different serotypes of airborne transmission (which might favour geographical spread?) or subclinical infection and subsequent transmission (which might favour transmission despite inspection as a result of trade?).

\begin{reply}
We thank the reviewer for this important, thought-provoking comment.
An alternative explanation to the differences observed might be the stochasticity involved in the epidemics: even minor differences in transmissibility or incubation period, say, might be amplified in terms of attack ratio and other population-level variables.
In other words, it might be the case that highly variable population processes are actually responsible for most of the observed variation.
We have now added a paragraph at the end of the Discussion expounding a bit more on this topic.
We again thank the reviewer for reminding us of this important caveat.
\end{reply}

My specific major concern is with the effect of differential sampling effort resulting in different numbers of samples in each country, rather than this being a feature of the epidemiology of the disease. 
Chile, Guyana, French Guiana and Suriname have no FMDV sequences, and Paraguay has no A sequence even though wrlfmd.org shows that Paraguay, Guyana and Chile have experienced recorded outbreaks. 
It is very likely that French Guiana and Suriname have too. 
Peru, Paraguay and Uruguay also have very low numbers of samples in total. Some work has been carried out to investigate the sensitivity to spatial sampling heterogeneity, but not with respect to the predictors of spread. 
Depending on whether there is detailed information on the location of VP1 sequences within countries, this inference could depend strongly on these poorly sampled (or unsampled) countries, and it would seem important to investigate whether this effect alters the conclusions, perhaps by removing the poorly sampled countries (since we can't add in potential missing countries), and just investigating spread between the countries with high sample numbers for the serotype.

\begin{reply}
This is a legitimate concern. 
As discussed previously, we prefer a ``complete-data'' approach to assessing sampling bias: we fit a GLM with a stringent, sparsity-inducing prior, and include as many predictors associated with sampling bias (difference and product in number of sequences, number of sequences as origin-destination predictors) as possible.
The analyses in Figure 5 in the revised manuscript shows that none of the ``sampling bias'' predictors achieves an appreciable level of support (Bayes factor).
\end{reply}

It would also seem important to investigate whether the reason for the difference in observed predictors relates not to the different epidemiology of the serotypes, but to the different control policies in the different time periods studied. 
This could be investigated by doing all of the inference for serotype A again based just on the recent data.

\begin{reply}
This again is a very astute observation, for which we thank the reviewer.
We posit that the (temporal) preferential sampling models should capture this effect -- should it exist -- albeit imperfectly, through the coefficient of $-t$.
Unfortunately the present state-of-the-art methods do not allow for an easy way of incorporating complex (non-simple) time-covariates.
\end{reply}

Also, while it seems plausible to suppose that there might be a different substitution rate, this difference seems high, and a casual inspection of Figure S1 also suggests that there might be much faster substitution rate in A during the period for which O data exists, though I don't know why that might be the case.

\begin{reply}
Differences this large do exist among FMDV serotypes (see for instance~\cite{Tully2008}).
We agree that some of the differences could be caused by different sampling, etc, and that is why the substitution rate should not be read too much into.
See~\cite{Holmes2016} for a nice discussion on why differences in substitution rate may not and often do not reflect differences at the replication level.
\end{reply}

Finally, I see no explanation or justification for why the molecular clock model should be different between two serotypes of the same virus circulating through the same species in the same region. 
Some explanation would seem appropriate, or an investigation of what the implications for other conclusions might be if this might not be the case.

\begin{reply}
See our comment above.
We have now added a reference to the Holmes et al. review and added a couple sentences to the Discussion further reinforcing this point.
Thank you for drawing our attention to the issue.
\end{reply}

Minor detail:

Kullback-Leibler is occasionally misspelt as Kullback-Liebler.

\begin{reply}
Fixed. Thanks.
\end{reply}

\clearpage

\bibliography{FMDV_AMERICA}

\end{letter}

\end{document}