The Protein Data Bank (PDB) is the main repository of biomolecular strucutres. Let’s see what it contains:
stats <- read.csv("Data Export Summary.csv")
stats
## Molecular.Type X.ray EM NMR Integrative Multiple.methods
## 1 Protein (only) 178,795 21,825 12,773 343 226
## 2 Protein/Oligosaccharide 10,363 3,564 34 8 11
## 3 Protein/NA 9,106 6,335 287 24 7
## 4 Nucleic acid (only) 3,132 221 1,566 3 15
## 5 Other 175 25 33 4 0
## 6 Oligosaccharide (only) 11 0 6 0 1
## Neutron Other Total
## 1 84 32 214,078
## 2 1 0 13,981
## 3 0 0 15,759
## 4 3 1 4,941
## 5 0 0 237
## 6 0 4 22
stats$X.ray
## [1] "178,795" "10,363" "9,106" "3,132" "175" "11"
sum(stats$Neutron)
## [1] 88
The comma in these numbers leads to the numbers here bring read as character.
c("100", "10", "barry")
## [1] "100" "10" "barry"
#install.packages("readr")
library(readr)
stats <- read_csv("Data Export Summary.csv")
## Rows: 6 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Molecular Type
## dbl (4): Integrative, Multiple methods, Neutron, Other
## num (4): X-ray, EM, NMR, Total
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
stats
## # A tibble: 6 × 9
## `Molecular Type` `X-ray` EM NMR Integrative `Multiple methods` Neutron
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Protein (only) 178795 21825 12773 343 226 84
## 2 Protein/Oligosacch… 10363 3564 34 8 11 1
## 3 Protein/NA 9106 6335 287 24 7 0
## 4 Nucleic acid (only) 3132 221 1566 3 15 3
## 5 Other 175 25 33 4 0 0
## 6 Oligosaccharide (o… 11 0 6 0 1 0
## # ℹ 2 more variables: Other <dbl>, Total <dbl>
n.xray <- sum(stats$`X-ray`)
#n.em <-
n.total <- sum(stats$Total)
n.xray/n.total
## [1] 0.8095077
Q1: What percentage of structures in the PDB are solved by X-Ray and Electron Microscopy.
n.xray <- sum(stats$`X-ray`)
n.em <- sum(stats$`EM`)
n.total <- sum(stats$Total)
(n.xray + n.em)/n.total
## [1] 0.937892
Q2: What proportion of structures in the PDB are protein?
n.protein <- sum(stats[1,9])
n.protein/n.total
## [1] 0.8596889
Q3: SKIP… Looking up HIV structures including 1HSG
We can use the Molstar viewer online: https://molstar.org/viewer/
A new clean image showing the catalytic ASP25 amino acids in both chaings of the HIV-Pr dimer along with the inhibitor and all important active site water.
library(bio3d)
pdb <- read.pdb("1hsg")
## Note: Accessing on-line PDB file
pdb
##
## Call: read.pdb(file = "1hsg")
##
## Total Models#: 1
## Total Atoms#: 1686, XYZs#: 5058 Chains#: 2 (values: A B)
##
## Protein Atoms#: 1514 (residues/Calpha atoms#: 198)
## Nucleic acid Atoms#: 0 (residues/phosphate atoms#: 0)
##
## Non-protein/nucleic Atoms#: 172 (residues: 128)
## Non-protein/nucleic resid values: [ HOH (127), MK1 (1) ]
##
## Protein sequence:
## PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYD
## QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNFPQITLWQRPLVTIKIGGQLKE
## ALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTP
## VNIIGRNLLTQIGCTLNF
##
## + attr: atom, xyz, seqres, helix, sheet,
## calpha, remark, call
head(pdb$atom)
## type eleno elety alt resid chain resno insert x y z o b
## 1 ATOM 1 N <NA> PRO A 1 <NA> 29.361 39.686 5.862 1 38.10
## 2 ATOM 2 CA <NA> PRO A 1 <NA> 30.307 38.663 5.319 1 40.62
## 3 ATOM 3 C <NA> PRO A 1 <NA> 29.760 38.071 4.022 1 42.64
## 4 ATOM 4 O <NA> PRO A 1 <NA> 28.600 38.302 3.676 1 43.40
## 5 ATOM 5 CB <NA> PRO A 1 <NA> 30.508 37.541 6.342 1 37.87
## 6 ATOM 6 CG <NA> PRO A 1 <NA> 29.296 37.591 7.162 1 38.40
## segid elesy charge
## 1 <NA> N <NA>
## 2 <NA> C <NA>
## 3 <NA> C <NA>
## 4 <NA> O <NA>
## 5 <NA> C <NA>
## 6 <NA> C <NA>
#install.packages("pak")
pak::pak("bioboot/bio3dview")
## ! Using bundled GitHub PAT. Please add your own PAT using `gitcreds::gitcreds_set()`.
## ℹ Loading metadata database✔ Loading metadata database ... done
##
## ℹ No downloads are needed
## ✔ 1 pkg + 40 deps: kept 37 [7.5s]
library(bio3dview)
view.pdb(pdb)
# Select the important ASP 25 residue
sele <- atom.select(pdb, resno=25)
# and highlight them in spacefill representation
view.pdb(pdb, cols=c("navy","skyblue"),
highlight = sele,
highlight.style = "spacefill")
Read an ADK structure from the PDB database:
adk <- read.pdb("6s36")
## Note: Accessing on-line PDB file
## PDB has ALT records, taking A only, rm.alt=TRUE
adk
##
## Call: read.pdb(file = "6s36")
##
## Total Models#: 1
## Total Atoms#: 1898, XYZs#: 5694 Chains#: 1 (values: A)
##
## Protein Atoms#: 1654 (residues/Calpha atoms#: 214)
## Nucleic acid Atoms#: 0 (residues/phosphate atoms#: 0)
##
## Non-protein/nucleic Atoms#: 244 (residues: 244)
## Non-protein/nucleic resid values: [ CL (3), HOH (238), MG (2), NA (1) ]
##
## Protein sequence:
## MRIILLGAPGAGKGTQAQFIMEKYGIPQISTGDMLRAAVKSGSELGKQAKDIMDAGKLVT
## DELVIALVKERIAQEDCRNGFLLDGFPRTIPQADAMKEAGINVDYVLEFDVPDELIVDKI
## VGRRVHAPSGRVYHVKFNPPKVEGKDDVTGEELTTRKDDQEETVRKRLVEYHQMTAPLIG
## YYSKEAEAGNTKYAKVDGTKPVAEVRADLEKILG
##
## + attr: atom, xyz, seqres, helix, sheet,
## calpha, remark, call
m <- nma(adk)
## Building Hessian... Done in 0.019 seconds.
## Diagonalizing Hessian... Done in 0.093 seconds.
plot(m)
Write out our results as a wee ttrajectory/movie of predicted motions:
mktrj(m, file="adk_m7.pdb")