K 2 to K 18, AF122-AF165

Posted by – December 17, 2011

If you have an AF ID, please edit your ancestry information if you haven’t.

All the files are uploaded here. NextGen Gallery seems to have broken for some reason, so I can’t insert all the images in this post. But they’re all in there. The Fst values between the populations are in the “log” files.

You might want to check out Tips for “reading” the ADMIXTURE plots.

Some people had to be omitted because I haven’t gotten around to figuring out FTDNA format, etc.

The project is not dead

Posted by – November 26, 2011

I’ve been very busy with life. But I sent a bunch of IDs out this weekend. I probably won’t be able to do relatives though. The administrative overhead is too high, and I’m very pressed for time right now….

‘africa9′ calculator

Posted by – September 15, 2011

‘africa9′ calculator:

The calculator combines data from Henn et al. (2011), HGDP, and Behar et al. (2010). As a result, the number of SNPs is small: there is probably noise in the minor components, but the major components of one’s ancestry should be well-defined.

It should be used only by Africans and African-West Eurasian admixed individuals. It is not meant for people with additional admixture (e.g., South/East Asian or Native American).

Analysis of a Tutsi genotype

Posted by – August 30, 2011

With this post, Tutsi probably differ genetically from the Hutu, I hope to tamp down all the talk about how the Belgians invented the Tutsi-Hutu division. After putting the call out it took 2 months for me to get my hands on a genotype, and less than 24 hours to post some results.

August Update

Posted by – August 5, 2011

I ran all the non-relatives in batches of 50 (the last batch was 25 actually). I’m going to post it when I have time, but I’ve been swamped with some things in my non-internet life (all good, don’t worry). Some of you couldn’t access the google docs, so I wanted to create a zip file with everything working correctly. I’m also going to upload the raw results so that people can replicate. Then I’m going to run the relatives I received, and think about other analytic techniques. My goal is to get a lot more done between now and September 15th, because after that I’ll have lots of other obligations.

Until then, here’s my first pass at a Malagasy genotype.

Quest for the Malagasy genotype

Posted by – July 28, 2011

I would like to throw out the word that I am looking for a person with Malagasy ancestry for the African Ancestry Project. To my knowledge there are no thick marker autosomal analyses of the Malagasy people. After my recent exploration of Southeast Asian genetics I think even one individual would be highly informative.

As usual I would guarantee that these data are entirely private, and I do not share it with anyone. But in this case I would like to make an exception and stipulate that Joseph K. Pickrell, a graduate student at the University of Chicago, would also be very interested in access to a Malagasy genotype for the purposes of research. Since this is an undersampled population the marginal returns to a Malagasy genotype would be enormous for science, a public good rather than just a private gain.

Also, I am still looking for a Tutsi genotype so that I can ascertain the origin of this population.

Please contact me at africanancestryproject -at- gmail -dot- com.

Francais:
More

DIY Dodecad

Posted by – July 27, 2011

I know I’ve been tardy on this project of late because of various other obligations, so please check out DIY Dodecad. It’s not as fine-grained as some of you might want, but a definite good first pass.

23andMe’s new Roots project

Posted by – July 26, 2011

I think participants here will be interested:

A recent article in Wired Magazine highlighted how the genome revolution has been skipping most people in the world: 96% of participants in recent genomic studies trace most of their ancestry to Europe. Why? Statistical analysis is simpler in groups tracing ancestry to just one continental region so fewer individuals are needed to make discoveries. Although African Americans typically trace about 20% of their ancestry to Europe, studies to verify previous findings in this population have not been done for many diseases. Our understanding of how DNA influences disease risk in people with mostly non-European ancestry has a lot of catching up to do.

23andMe hopes to bridge this growing divide through Roots into the Future, a research initiative addressing the needs of the African American community. Our partners in the research initiative include Dr. Henry Louis Gates and the W.E.B. Du Bois Institute at Harvard, as well as advisors from academia, industry and the 23andMe community. Our goal is to enroll 10,000 participants who self-identify as African American, Black, or African in order to rapidly accelerate genetic research in the African American community.

Roots into the Future will help determine how genetic factors contribute to the development of disease in this population. Which genetic associations identified in Europeans also apply to African Americans? Can we discover new genetic markers linked to conditions of particular relevance to the African American community, such as diabetes, prostate cancer, and heart disease?

The initiative aligns with 23andMe’s broader mission of empowering individuals to understand their own genetic data. And 23andMe’s unique web-based research platform can accelerate critical research in this community.

Project participants will receive free access to their personal genetic data used for the research, as well as health and ancestry interpretations of the data. As the project progresses, participants can expect to see additional relevant reports and features.

Roots into the Future will launch at the end of July at the annual conference of the National Urban League in Boston. To learn more about the project, or to sign up to be notified when registration becomes more broadly available, go to www.23andme.com/roots.

K 4 to K 14, AF112-AF121

Posted by – July 14, 2011

If you have an AF ID, please edit your ancestry information if you haven’t.

All the files are uploaded here.

Relative Run A

Posted by – July 14, 2011

Enter the info in the ancestry spreadsheet.

The files are here.

Relative Run B, IDB1, IDB2

Posted by – July 14, 2011

Made an error, so that RB1 = IDB1 and RB2 = IDB2.

Enter the info in the ancestry spreadsheet.

The files are here.

Updates

Posted by – July 11, 2011

1) I am doing two relative runs right now. Only 6 individuals actually. If you haven’t, please resend your relative information with this format.

2) I am running AF112-AF121. Samples have trailed off, so I’m going to do a final rerun with the older samples in batches, and then start doing different things like supervised runs, different tools, etc.

3) Remember to double-check the ethnic ancestry spreadsheet.

Back soon….

Posted by – July 6, 2011

Sorry about the radio silence. Been busy with other things (in case you don’t know, in June I was contributing to 5 other weblogs!) I’ll be rerunning the samples I have again, and getting the relative runs out there. Probably this week. I haven’t gotten many samples recently, so the pent up demand is probably getting exhausted. Which is fine, we’ll figure out new things to do with the data (someone suggested HapMix).

Speaking of which, does anyone know of any Malagasy or Tutsi genotypes out there?

Cape Coloured analysis

Posted by – June 16, 2011

I got some Cape Coloured samples today. Since the AAP reference set is Afrocentric I wanted to run them with an Indian and East Asian data set first. You can see the results at my other weblog. No big surprises. Cape Coloured have most of the World Island’s variation.

Update on the “relative runs”

Posted by – June 15, 2011

First, if you haven’t, check out Interpretome.

Second, I’m finally going to start doing “relative” runs. But I have such a back log that I’m going to have to ask you do help me out if you want me to run relatives. The main issue is that I’m going to have to do them in distinctive batches, because related people ruin the results. Some of you have one relative, others have five, of which four might be related to each other, while one is relate to you (e.g., 4 paternal relatives vs. 1 maternal relative).

The account for this is kind of a nightmare if I have to go back, so I’m going to ask you to resend the raw data for your relatives. But I need a specific format. I want to rename the files like so:

Give a number starting with “1″ to each person who needs to be run separately. So if you are sending me your siblings, or your siblings and your uncle, or your cousin and your uncle, these are all related individuals. They need separate numbers so I know who is related to who. Second, I want you to add a separate number, separated by an underscore, for individuals who are related to you, but unrelated to each other, and so that I can run them in the same batch, starting with “1.” If you have only one relative, just add underscore “1.”

Since that was probably confusing, I will give you examples.

If you send more 4 siblings, these are the filenames I will want:

1_1
2_1
3_1
4_1

This means they run in four separate batches, and they’re the only additions to my pool from that batch.

If you send me a maternal uncle and a paternal cousin:

1_1
1_2

Since these two individuals are unrelated, I can run them together. But they need to be distinguished, so _1 and _2.

If you send me three siblings, a maternal uncle, and paternal cousin:

First, the siblings:
1_1
2_1
3_1

Next you need the maternal uncle separate as well:

4_1

But, since the paternal cousin is unrelated to the maternal uncle:

4_2

As you can see the main reason I’m doing this is that it keeps related people separate, but, I can also pool the data in the most efficient and quick manner possible so that I need the fewest runs to produce results. I will send you a relative ID once I get the data.

If you have a 23andMe raw file, it should be something like genome_your_name_full.txt or if you zip it genome_your_name_full.zip. So you need to rename them like so: 1_1.txt

Until my relative runs are complete I’m not going to be assigning new AFIDs except in exceptional circumstances (I just got a Cape Coloured Sample, I’m running that!).

Posts of interest

Posted by – June 11, 2011

Since I have 5 Somalis, I decided to use them to my advantage. So two posts at my main weblog:

A genomic sketch of the Horn of Africa.

Flavors of Afro-Asiatic.

“Thick” K 10, 12 14, AF001-AF111, with error

Posted by – June 7, 2011

I made a mistake by including two close relates in the thick run, AF041 and AF098. As I reorganize the data set I figured I should post what I have anyway. Basically the family members take up 1 cluster by themselves. I also posted plots for all the ethnicities I have. Please note, some of you seem unclear about this, but in many cases I did prune admixed individuals a priori from the data set (look at the Maya for example).

If you have an AF ID, please edit your ancestry information if you haven’t.

All the files are here.

Mistake in my “thick run”

Posted by – June 6, 2011

So I ran a “thick run” with 300,000 markers all weekend…and I realized today that I have a close relatives pair in my data set when looking at the output (my mistake). I can’t really keep track of relatives easily in email, and need to generate a “relatives run,” so I have to rethinking how to do the bookkeeping.

PC 1 to PC 6, AF001-AF111

Posted by – June 6, 2011

I ran the AAP sample so far with the “Thin Reference” (40 K markers) to generate principle component plots. Each axis represents a dimension of genetic variation. I ran the first 6 dimensions. The plots will not be to scale, so here are the eigenvalues of the dimensions (they give you a sense of the size of the dimensions, so the first dimension which separates Africans from non-Africans is the biggest, as it always is):

1, 45.840354
2, 10.501416
3, 8.768513
4, 3.583422
5, 3.384427
6, 2.479799

You can generate your own plots with the table of eigenvectors I uploaded here.

A supervised run

Posted by – June 5, 2011

I’m going to do a supervised run soon. In the reference populations which ones should I set as “pure”?