Skip to content

Adding the PRRSV2 nsp3-4-5 dataset from Baby et al. 2026 (under review)#458

Open
LMIVV-medvet wants to merge 2 commits into
nextstrain:masterfrom
LMIVV-medvet:master
Open

Adding the PRRSV2 nsp3-4-5 dataset from Baby et al. 2026 (under review)#458
LMIVV-medvet wants to merge 2 commits into
nextstrain:masterfrom
LMIVV-medvet:master

Conversation

@LMIVV-medvet

Copy link
Copy Markdown

Description of proposed changes

We want to add a new community dataset to Nextclade based of the nsp3-4-5 region of the PRRS virus type. We would like it to be released upon the publication of the paper which is currently under review or before, if asked by the reviewers.

Checklist

  • Check if changes affect downstream workflows which depend on this dataset. For instance, Nextstrain ingest workflows may break if clade nomenclature changes. Consider fixing those workflows or making an issue at least.

@rneher rneher mentioned this pull request Jun 22, 2026
@rneher

rneher commented Jun 22, 2026

Copy link
Copy Markdown
Member

@rneher

rneher commented Jun 22, 2026

Copy link
Copy Markdown
Member

Thanks for submitting this PR. I made a preview link:

https://master.clades.nextstrain.org/?dataset-server=gh:@LMIVV

I also have a few concrete questions:

Your dataset name has both BabyV and Baby2026, is this intended?

community/UdeM-LMIVV/BabyV/PRRSV2/nsp3-4-5/Baby2026

The lineages/clades defined are not always very well separated and some seem very rare. There might not be a better solution to this, but I thought I'd flag it:

image

The tree seem rooted on n68. There is no need to root on a specific strain and a separate rooting might be closer to what is biological relevant.

image

@rneher

rneher commented Jun 22, 2026

Copy link
Copy Markdown
Member

Regarding release. This can be released once technical and biological questions regarding the dataset are clarified. Once it is released, it can only be updated, not removed.

@LMIVV-medvet

Copy link
Copy Markdown
Author

Hello,

Thank you for reviewing our dataset, here are the answers to your questions:

-Your dataset name has both BabyV and Baby2026, is this intended?
Yes, it was intended. We used BabyV as recommended for the directory structure and Baby2026 to point towards the paper. However, we can change them both to either Baby2026 or BabyV if you prefer we do so, we really don't mind.

-The lineages/clades defined are not always very well separated and some seem very rare.
Indeed, the number of complete PRRSV2 genomes is not very high (we curated ~1800 whole genome sequences to build the dataset), we believe that we underestimate the diversity of the virus. For this first version of the dataset, we chose to keep the rarer clades to conserve as much diversity as possible, knowing that those clades may prove to be either oddities or more relevant as new sequences will be available. We will make changes on subsequent releases accordingly.

-The tree seem rooted on n68. There is no need to root on a specific strain, and a separate rooting might be closer to what is biological relevant.
Yes, we rooted the tree on the sequence that is used as the reference for the alignment. I agree with you that another sequence could be better biologically or that we could simply unroot it. That reference was chosen mostly for historic reasons and for its status as a RefSeq reference genome. This is why we thought it could be relevant to root the tree on that sequence. Clade n68 is also of high interest as it is the most prevalent worldwide as it corresponds the most used vaccine strain, as well as the reference. However, we are open to make modifications of you require it.

Thank you very much,

@rneher

rneher commented Jun 27, 2026

Copy link
Copy Markdown
Member

thanks for following up and answering my questions. I posed these mostly to ensure that these were conscious choices. If you are happy with how the dataset performs, that is fine by me.

Regarding the root: it is true that nextclade requires a specific reference to align to and historically this had to be the root. But we can now separate the two. So if you'd rather root at midpoint, feel free to change. There is a short note on this in the FAQ. Otherwise, happy to keep as is.

Let me know how you want to proceed and what time line for release you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants