On which human chromosome is this SNP located and at what position?
This SNP is on chromosome 12, on the q (long) arm, within the band 12q12. You may also have noticed its position designated as 40340400, which is its position in nucleotides measured from the end of the chromosome.
Does rs34637584 occur within a gene or between genes? If it occurs within a gene, is it within an exon or an intron?
This SNP is within an exon #41 (of 51) of the gene LRRK2.
Describe the Alleles that occur through variation at this site?
Examining the data from the NHGRI Catalog of Published GWAS (click on the accession number of the SNP itself to see this quickly) or the dbSNP data shows that two variations have been reported for the nucleotide at this position: the wild-type allele has a G nucleotide here, and the allele identified by Do et al. as correlated with PD (0.2% of all alleles in the population) has an A.
List the genes found within approximately 500kb on either side of the SNP
Toward the left we find genes (nearest to farthest) AC079630.1, LINC02471, LINC02553, SLC2A13.
Toward the right we find genes (nearest to farthest) AC107023.1, CNTN1, MUC19.
Has this site or any of the nearby genes been associated with PD previously?
The descriptions of the genes in OMIM are the easiest way to find these associations. LRRK2 (OMIM accession no. 609007) is in fact the gene most frequently mutated in inherited cases of PD, and this was known prior to the Do et al. experiment. (This is actually a valuable finding, because it helps to validate the GWAS experiment!) SLC2A13 has no known association with disease. CNTN1 is associated with a form of congenital myopathy (muscle disease) but has no apparent association with PD or any brain disorder.
What evidence did you find to support the identification of one or more of the genes in this region as a candidate for PD-associated gene?
Surprisingly, given the known association with LRRK2 with PD, there is not a lot of obvious genomic evidence to support a link. The gene does not appear to be strongly expressed in the brain based on expression experiments to date, though there is some expression in nerve tissue. If the connection of mutations in this region with PD were not already known, one might wonder whether researchers identifying this SNP in a GWAS would find enough evidence to encourage them to pursue further characterization!
How long is the entire SREBF1 gene?
You can see from the various depictions of this gene in the UCSC Genes track that different combinations of exons have been found in different experiments looking at this gene. The best place to get the most clearly defined information is probably the RefSeq track; clicking the gene name in this track gives a genomic size of 25,663 bp (25.7 kb), though the source gene includes 32,663bp
When you click on the CDS link in a GenBank entry, only short segments of the gene sequence are highlighted. What do these highlighted segments represent?
These are the exons: the pieces of the coding sequence that are separated by introns in the DNA sequence but brought together by splicing after the mRNA is synthesized.
How long is the spliced mRNA for SREBF1? What fraction of the gene is thrown away in the form of spliced-out introns?
One good way to get this information is to click the mRNA link in a GenBank entry, then click at the lower right to get the mRNA in FASTA format and do a character count in a text editor. Or, navigate to the Gene database and hover over the gene in the genome browser display there. The spliced mRNA for SREBF1 is 5001 nt long (or, you may find a different splicing variant that is 4922 nt long). This means that 20,662 nt (or 20,741 nt), or about 81% of the total length of the gene, is removed as introns during splicing.
What accounts for the difference between the sequence segments that are highlighted when you click on the mRNA link versus when you click on the CDS link?
The mRNA includes untranslated sequences on the ends of the mRNA (necessary, for example, in order for the large ribosome to bind the mRNA and position itself at the start codon) as well as the actual coding sequence (start codon to stop codon, encoding the amino acids that make up the protein).
How long is the SREBF1 protein in amino acids? What are the first 10 amino acids in the protein sequence?
The actual coding sequence (CDS link) is 3531 bp long. You could just divide this number by three to get 1178 codons, but remember that the stop codon shouldn’t be included as an amino acid, so you’d need to subtract one to get 1177 amino acids. Or, just find the protein in a protein database or look for its length in the Gene database.
The first 10 amino acids are MDEPPFSEAA, or Met-Asp-Glu-Pro-Pro-Phe-Ser-Glu-Ala-Ala.
What is known about the function of this gene?
This gene encodes a transcription factor, a protein which binds to specific DNA sequences and stimulates the transcription of other genes. In this case, the binding site for the SREBF1 protein is called SRE-1, and binding of SREBF1 protein to this site regulates genes involved in making sterols, the molecules that are made into steroid hormones and cholesterol.
Besides its hypothetical associate with PD, what are two other known connections of SREBF1 to disease?
SREBF1 is not linked to any diseases in OMIM. However, as you might guess from its function, association studies tentatively link this gene to atherosclerosis, obesity and type 2 diabetes.