Cimetidine tutorial: indexing, spacegroup determination & structure solution

In this notebook, you can: * Load the powder diffraction data and create the PowderPattern object * Find the diffraction peaks and index them (determine the unit cell) * Perform a profile fit to optimise the background and reflection profiles * Determine the spacegroup * Add a molecule to describe the contents of the Crystal structure * Solve the Crystal structure using a Monte-Carlo/Parallel tempering algorithm * Save the best result to a CIF file and to Fox .xmlgz format

Notes: * This is an ideal case for structure solution from powder diffraction - a clean powder diffraction data easily indexed, an unambiguous spacegroup, and a relatively simple structure. * It is important to follow the steps relatively linearly and avoid going back to previous cells until you know better. For example to avoid adding multiple times Scatterer/Molecule objects in the crystal structure, or multiple crystalline phases to the powder pattern with the same crystal, etc…

Imports

[1]:
# 'widget' allows live update and works in both classical notebooks and jupyter-lab.
# Otherwise 'notebook', 'ipympl', 'inline' can be used
%matplotlib widget

import os
import pyobjcryst
import numpy as np
import matplotlib.pyplot as plt
from pyobjcryst.crystal import *
from pyobjcryst.powderpattern import *
from pyobjcryst.indexing import *
from pyobjcryst.molecule import *
from pyobjcryst.globaloptim import MonteCarlo
from pyobjcryst.io import xml_cryst_file_save_global

Create powder pattern object, download data if necessary

[2]:
p = PowderPattern()
if not os.path.exists("cime.dat"):
    os.system("curl -O https://raw.githubusercontent.com/vincefn/objcryst/master/Fox/example/tutorial-cimetidine/cime.dat")
p.ImportPowderPatternFullprof("cime.dat")
p.SetWavelength(1.52904)

p.plot()
Imported powder pattern: 7699 points, 2theta=  8.010 ->  84.990, step= 0.010

Find peaks & index the reflections

In this case the peaks are automatically found without any parasitic phase.

And the unit cell is also indexed without any ambiguity. This uses the dichotomy in volume approach (Louër & Boultif).

… It is not always that easy !

[3]:
# Index
pl = p.FindPeaks(1.5, -1, 1000)
if len(pl) > 20:
    pl.resize(20)  # Only keep 20 peaks
for peak in pl:
    print(peak)

ex = quick_index(pl)

print("Solutions:")
for s in ex.GetSolutions():
    print(s)
Peak dobs=0.10632+/-0.00011 iobs=2.146855e+04 (? ? ?))
Peak dobs=0.11354+/-0.00011 iobs=4.111350e+03 (? ? ?))
Peak dobs=0.14620+/-0.00011 iobs=9.429778e+04 (? ? ?))
Peak dobs=0.15277+/-0.00011 iobs=1.388049e+03 (? ? ?))
Peak dobs=0.16177+/-0.00011 iobs=1.420839e+03 (? ? ?))
Peak dobs=0.16602+/-0.00014 iobs=1.141690e+05 (? ? ?))
Peak dobs=0.18616+/-0.00011 iobs=1.609675e+05 (? ? ?))
Peak dobs=0.18839+/-0.00011 iobs=4.474511e+04 (? ? ?))
Peak dobs=0.18984+/-0.00011 iobs=1.839251e+05 (? ? ?))
Peak dobs=0.20064+/-0.00011 iobs=1.290410e+05 (? ? ?))
Peak dobs=0.20760+/-0.00011 iobs=1.182234e+05 (? ? ?))
Peak dobs=0.21186+/-0.00006 iobs=2.198665e+03 (? ? ?))
Peak dobs=0.21262+/-0.00011 iobs=8.717511e+03 (? ? ?))
Peak dobs=0.21507+/-0.00011 iobs=1.818877e+04 (? ? ?))
Peak dobs=0.22072+/-0.00008 iobs=2.098754e+04 (? ? ?))
Peak dobs=0.22153+/-0.00011 iobs=6.288388e+04 (? ? ?))
Peak dobs=0.22394+/-0.00011 iobs=1.562582e+05 (? ? ?))
Peak dobs=0.22705+/-0.00014 iobs=3.681909e+03 (? ? ?))
Peak dobs=0.23104+/-0.00011 iobs=2.493547e+04 (? ? ?))
Peak dobs=0.23505+/-0.00011 iobs=1.279133e+03 (? ? ?))
Predicting volumes from 20 peaks between d=94.058 and d= 4.254

Starting indexing using 20 peaks
      CUBIC P : V=   4699 ->  52534 A^3, max length=112.36A
 ->   0 sols in   0.00s, best score=   0.0

 TETRAGONAL P : V=   1744 ->  12585 A^3, max length= 69.78A
 ->   0 sols in   0.02s, best score=   0.0

RHOMBOEDRAL P : V=   1932 ->  13204 A^3, max length= 70.91A
 ->   0 sols in   0.00s, best score=   0.0

  HEXAGONAL P : V=   2382 ->  17415 A^3, max length= 77.76A
 ->   0 sols in   0.05s, best score=   0.0

ORTHOROMBIC P : V=   1014 ->   6526 A^3, max length= 56.06A
 ->   1 sols in   0.09s, best score=   7.3

 MONOCLINIC P : V=    756 ->   4210 A^3, max length= 48.44A
 ->   1 sols in   0.05s, best score= 130.0

Solutions:
( 6.83 18.82 10.39  90.0 106.4  90.0 V=1281 MONOCLINIC P, 130.0296630859375)

Create a crystal phase using the indexed unit cell

[4]:
uc = ex.GetSolutions()[0][0].DirectUnitCell()
c = pyobjcryst.crystal.Crystal(uc[0], uc[1], uc[2], uc[3], uc[4], uc[5], "P1")
pdiff = p.AddPowderPatternDiffraction(c)

# Plot with indexing in new figure
p.plot(diff=False,fig=None,hkl=True)

Fit the profile and background

We use a maximum sin(theta)/lambda because we don’t really need high angle/high resolution data.

This will go faster and is more reliable for spacegroup indexing and structure solution.

[5]:
p.SetMaxSinThetaOvLambda(0.3)
p.quick_fit_profile(auto_background=True,plot=False, init_profile=True,verbose=True)
p.quick_fit_profile(plot=False, init_profile=False, asym=True, displ_transl=True, verbose=False)

# Plot in new figure
p.plot(diff=True, fig=None, hkl=True)
print("Fit result: Rw=%6.2f%% Chi2=%10.2f  GoF=%8.2f  LLK=%10.3f" %
      (p.rw * 100, p.chi2, p.chi2/p.GetNbPointUsed(), p.llk))
No background, adding one automatically
Selected PowderPatternDiffraction:    with Crystal:
Profile fitting finished.
Remember to use SetExtractionMode(False) on the PowderPatternDiffraction object
to disable profile fitting and optimise the structure.
Fit result: Rw=  5.51% Chi2=  34095.69  GoF=    4.43  LLK=  6397.991

Find the spacegroup

The SpaceGroupExplorer can be used to find the optimal spacegroup.

What RunAll() does is try all spacegroups and settings which are compatible with the unit cell (in this case all monoclinic and triclinic), and perform a profile fit (Le Bail only, we don’t refine profile parameters or background since these parameters should be OK).

From this several values are extracted for each spacegroup setting: * Rw - the standard full-profile weighted R factor \(R_{wp}\) * GoF: the chi2 (full profile \(\chi^2=\Sigma\frac{(obs-calc)^2}{\sigma^2}\)) divided by the number of points used * nGoF: this is the Goodness-of-Fit, but computed on integration intervals defined by P1 reflections, and then multipled by the number of reflections used divided by the number of reflections for the P1 spacegroup. This is more discriminating and allows to put forward spacegroups which yield a good fit with more extinctions. * reflections is the number of reflections actually taken into account for this spacegroup up to the maximum sin(theta)/lambda * extinct446 gives the number of extinct reflections for 0<=H<=4 0<=K<=4 0<=L<=6 (which is used internally as a unique fingerprint for the extinctions)

Some C++ verbose output does not appear here but will be in the jupyter server log if you see it.

The results are sorting by ascending nGOF

[6]:
p.SetMaxSinThetaOvLambda(0.2)  # Important for stability of profile fit. And faster !
spgex = SpaceGroupExplorer(pdiff)

# NB:verbose C++ output does not appear in a notebook
spgex.RunAll(keep_best=True, update_display=False, fitprofile_p1=False)

for sol in spgex.GetScores():
    #if sol.nGoF > 4 * spgex.GetScores()[0].nGoF:
    if sol.GoF <= 2 * spgex.GetScores()[0].GoF:
        print(sol)

print("Chosen spacegroup (smallest nGoF): ", c.GetSpaceGroup())

# Updated plot with optimal spacegroup
p.plot(diff=True, fig=None, hkl=True, reset=True)
P 1 21/c 1    nGoF=   1.4866 GoF=  13.575 Rw= 6.50 [ 92 reflections, extinct446= 17]Beginning spacegroup exploration... 37 to go...
  (#  1) P 1           : Rwp=  6.80%  GoF=    14.98  nGoF=     3.24  (186 reflections,   0 extinct)
  (#  2) P -1          : Rwp=  6.80%  GoF=    14.98  nGoF=     3.24  (186 reflections,   0 extinct) [same extinctions as:P 1]
  (#  3) P 1 2 1       : Rwp=  6.69%  GoF=    14.46  nGoF=     1.84  (105 reflections,   0 extinct)
  (#  4) P 1 21 1      : Rwp=  6.61%  GoF=    14.11  nGoF=     1.65  (101 reflections,   2 extinct)
  (#  5) C 1 2 1       : Rwp= 62.70%  GoF=  1246.13  nGoF=   311.68  ( 52 reflections,  84 extinct)
  (#  5) A 1 2 1       : Rwp= 62.83%  GoF=  1253.74  nGoF=   313.87  ( 53 reflections,  85 extinct)
  (#  5) I 1 2 1       : Rwp= 60.91%  GoF=  1196.43  nGoF=   246.94  ( 52 reflections,  87 extinct)
  (#  6) P 1 m 1       : Rwp=  6.69%  GoF=    14.46  nGoF=     1.84  (105 reflections,   0 extinct) [same extinctions as:P 1 2 1]
  (#  7) P 1 c 1       : Rwp=  6.58%  GoF=    13.92  nGoF=     1.66  ( 96 reflections,  15 extin
P 1 21 1      nGoF=   1.6488 GoF=  14.108 Rw= 6.61 [101 reflections, extinct446=  2]
P 1 21/m 1    nGoF=   1.6488 GoF=  14.108 Rw= 6.61 [101 reflections, extinct446=  2]
P 1 c 1       nGoF=   1.6649 GoF=  13.922 Rw= 6.58 [ 96 reflections, extinct446= 15]
P 1 2/c 1     nGoF=   1.6649 GoF=  13.922 Rw= 6.58 [ 96 reflections, extinct446= 15]
P 1 2 1       nGoF=   1.8392 GoF=  14.458 Rw= 6.69 [105 reflections, extinct446=  0]
P 1 m 1       nGoF=   1.8392 GoF=  14.458 Rw= 6.69 [105 reflections, extinct446=  0]
P 1 2/m 1     nGoF=   1.8392 GoF=  14.458 Rw= 6.69 [105 reflections, extinct446=  0]
P 1           nGoF=   3.2428 GoF=  14.982 Rw= 6.80 [186 reflections, extinct446=  0]
P -1          nGoF=   3.2428 GoF=  14.982 Rw= 6.80 [186 reflections, extinct446=  0]
P 1 21/n 1    nGoF=   5.4155 GoF=  26.697 Rw= 9.11 [ 92 reflections, extinct446= 19]
P 1 n 1       nGoF=   5.7672 GoF=  27.063 Rw= 9.17 [ 96 reflections, extinct446= 17]
P 1 2/n 1     nGoF=   5.7672 GoF=  27.063 Rw= 9.17 [ 96 reflections, extinct446= 17]
Chosen spacegroup (smallest nGoF):  P 1 21/c 1
ct)
  (#  7) P 1 n 1       : Rwp=  9.17%  GoF=    27.06  nGoF=     5.77  ( 96 reflections,  17 extinct)
  (#  7) P 1 a 1       : Rwp=  9.26%  GoF=    27.58  nGoF=     5.97  ( 97 reflections,  14 extinct)
  (#  8) C 1 m 1       : Rwp= 62.70%  GoF=  1246.13  nGoF=   311.68  ( 52 reflections,  84 extinct) [same extinctions as:C 1 2 1]
  (#  8) A 1 m 1       : Rwp= 62.83%  GoF=  1253.74  nGoF=   313.87  ( 53 reflections,  85 extinct) [same extinctions as:A 1 2 1]
  (#  8) I 1 m 1       : Rwp= 60.91%  GoF=  1196.43  nGoF=   246.94  ( 52 reflections,  87 extinct) [same extinctions as:I 1 2 1]
  (#  9) C 1 c 1       : Rwp= 62.51%  GoF=  1236.15  nGoF=   280.76  ( 47 reflections,  93 extinct)
  (#  9) A 1 n 1       : Rwp= 62.99%  GoF=  1258.62  nGoF=   291.60  ( 49 reflections,  93 extinct)
  (#  9) I 1 a 1       : Rwp= 59.00%  GoF=  1120.91  nGoF=   221.05  ( 48 reflections,  93 extinct)
  (#  9) A 1 a 1       : Rwp= 62.99%  GoF=  1258.62  nGoF=   291.60  ( 49 reflections,  93 extinct) [same extinctions as:A 1 n 1]
  (#  9) C 1 n 1       : Rwp= 62.51%  GoF=  1236.15  nGoF=   280.76  ( 47 reflections,  93 extinct) [same extinctions as:C 1 c 1]
  (#  9) I 1 c 1       : Rwp= 59.00%  GoF=  1120.91  nGoF=   221.05  ( 48 reflections,  93 extinct) [same extinctions as:I 1 a 1]
  (# 10) P 1 2/m 1     : Rwp=  6.69%  GoF=    14.46  nGoF=     1.84  (105 reflections,   0 extinct) [same extinctions as:P 1 2 1]
  (# 11) P 1 21/m 1    : Rwp=  6.61%  GoF=    14.11  nGoF=     1.65  (101 reflections,   2 extinct) [same extinctions as:P 1 21 1]
  (# 12) C 1 2/m 1     : Rwp= 62.70%  GoF=  1246.13  nGoF=   311.68  ( 52 reflections,  84 extinct) [same extinctions as:C 1 2 1]
  (# 12) A 1 2/m 1     : Rwp= 62.83%  GoF=  1253.74  nGoF=   313.87  ( 53 reflections,  85 extinct) [same extinctions as:A 1 2 1]
  (# 12) I 1 2/m 1     : Rwp= 60.91%  GoF=  1196.43  nGoF=   246.94  ( 52 reflections,  87 extinct) [same extinctions as:I 1 2 1]
  (# 13) P 1 2/c 1     : Rwp=  6.58%  GoF=    13.92  nGoF=     1.66  ( 96 reflections,  15 extinct) [same extinctions as:P 1 c 1]
  (# 13) P 1 2/n 1     : Rwp=  9.17%  GoF=    27.06  nGoF=     5.77  ( 96 reflections,  17 extinct) [same extinctions as:P 1 n 1]
  (# 13) P 1 2/a 1     : Rwp=  9.26%  GoF=    27.58  nGoF=     5.97  ( 97 reflections,  14 extinct) [same extinctions as:P 1 a 1]
  (# 14) P 1 21/c 1    : Rwp=  6.50%  GoF=    13.58  nGoF=     1.49  ( 92 reflections,  17 extinct)
  (# 14) P 1 21/n 1    : Rwp=  9.11%  GoF=    26.70  nGoF=     5.42  ( 92 reflections,  19 extinct)
  (# 14) P 1 21/a 1    : Rwp=  9.20%  GoF=    27.22  nGoF=     5.61  ( 93 reflections,  16 extinct)
  (# 15) C 1 2/c 1     : Rwp= 62.51%  GoF=  1236.15  nGoF=   280.76  ( 47 reflections,  93 extinct) [same extinctions as:C 1 c 1]
  (# 15) A 1 2/n 1     : Rwp= 62.99%  GoF=  1258.62  nGoF=   291.60  ( 49 reflections,  93 extinct) [same extinctions as:A 1 n 1]
  (# 15) I 1 2/a 1     : Rwp= 59.00%  GoF=  1120.91  nGoF=   221.05  ( 48 reflections,  93 extinct) [same extinctions as:I 1 a 1]
  (# 15) A 1 2/a 1     : Rwp= 62.99%  GoF=  1258.62  nGoF=   291.60  ( 49 reflections,  93 extinct) [same extinctions as:A 1 n 1]
  (# 15) C 1 2/n 1     : Rwp= 62.51%  GoF=  1236.15  nGoF=   280.76  ( 47 reflections,  93 extinct) [same extinctions as:C 1 c 1]
  (# 15) I 1 2/c 1     : Rwp= 59.00%  GoF=  1120.91  nGoF=   221.05  ( 48 reflections,  93 extinct) [same extinctions as:I 1 a 1]
Restoring best spacegroup: P 1 21/c 1

Add Cimetidine molecule to the crystal structure

This is imported from a Fenske-Hall z-matrix, downloaded as needed.

We can also look at options, and disable the Dynamical Occupancy Correction, because we don’t expect the molecule atoms to be overlapping with each other.

[7]:
if not os.path.exists("cime.fhz"):
    os.system("curl -O https://raw.githubusercontent.com/vincefn/objcryst/master/Fox/example/tutorial-cimetidine/cime.fhz")

# This will automatically add the Molecule to the Crystal object
m = ImportFenskeHallZMatrix(c,"cime.fhz")
print("Crystal Formula:", c.GetFormula())
print()

# List the options to see which are available
for i in range(c.GetNbOption()):
    o = c.GetOption(i)
    print("Option #%d: %s = %s"% (i, o.GetName(), o.GetChoiceName(o.GetChoice())))
    for j in range (o.GetNbChoice()):
        print("   Choice %d: %s" %(j, o.GetChoiceName(j)))


c.GetOption(1).SetChoice(0)
Crystal Formula: C8.9 N6 S

Option #0: Constrain Lattice to SpaceGroup Symmetry = Yes (Default)
   Choice 0: Yes (Default)
   Choice 1: No (Allow Crystallographic Pseudo-Symmetry)
Option #1: Use Dynamical Occupancy Correction = Yes
   Choice 0: No
   Choice 1: Yes
Option #2: Display Enantiomer = No
   Choice 0: No
   Choice 1: Yes

Create a MonteCarlo object and add objects (crystal, powder pattern) for optimisation

[8]:
mc = MonteCarlo()
mc.AddRefinableObj(c)
mc.AddRefinableObj(p)

Disable profile fitting before Monte-Carlo

..or else the crystal structure will not be optimised

Note that the following display will be live-updated during the optimisation done below (the last plot is alwasy updated).

[9]:
pdiff.SetExtractionMode(False)
p.FitScaleFactorForRw() # for a better initial display
p.plot(fig=None,diff=False,hkl=True)

Display the 3D crystal structure

Note: this requires installing ``ipywidgets`` and ``py3Dmol`` (as of 2021/05 the conda-forge version is obsolete, so just install using pip). Otherwise You will just get a warning message

This will be updated live during the optimisation, and also when using RestoreParamSet() to restore some specific solutions (and generally everytime the underlying Crystal’s UpdateDisplay() function is called). Just scroll back to see what is being done in the widget.

The display() is only really necessary to make sure the widget appears in the notebook. In fact if c.widget_3d() is the last command in the notebook cell, the display is done automatically. See the ipywidgets documentation if you want to understand this in more details.

Note that bonds may disappear during optimisation, because they are automatically assigned by the javascript viewer, which is quite strict about allowed distances. In the final solution some bonds in the middle of the chain are often missing, though you can see the atoms are reasonably close. But rest assured that any bond defined in the object still exists as defined in pyobjcryst !

[10]:
display(c.widget_3d())

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

Run multiple optimisations

We also enable the automatic least squares every 150k trials, which allows a better convergence

We perform 3 runs, each of 1 million trials using parallel tempering, with default parameters (which should be adequate for all problems). Normally for this structure it would be better to use 2 millions trials so that the correct solution is found during almost every run.

Each run starts from a randomised configuration.

[11]:
mc.GetOption("Automatic Least Squares Refinement").SetChoice(2)
print("LSQ option: ", mc.GetOption("Automatic Least Squares Refinement").GetChoiceName(2))

# 3D structure view which will be live-updated with the best
# configuration of the current run
display(c.widget_3d())

# Small widget to see the progress of the optimisation, with the current run
# best log-likelihood, the run number and remaining number of trials.
display(mc.widget())

# The powder pattern plot a few cells above should also be updated for each run best solution
mc.MultiRunOptimize(nb_run=3, nb_step=1e6)
print("Final LLK: %.2f" % mc.GetLogLikelihood())
LSQ option:  Every 150000 trials, and at the end of each run

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

Final LLK: 18501.63

List solutions

All solutions are stored in a “Parameter Set” which can be restored (assuming that the objects - crystal structure and powder pattern are not altered e.g. by changing the list of atoms, the profile, or the fixed parameters etc…).

This will only record changes of parameters such as atom coordinates, but will not record other changes such as a different spacegroup, or a change of the Scatterers (number of atoms or molecules) inside a Crystal. It can only be used to browse results obtained at the end of MultiRunOptimize().

At the end of the optimisation the best solution is automatically restored.

[12]:
for i in range(mc.GetNbParamSet()):
    idx = mc.GetParamSetIndex(i)
    cost = mc.GetParamSetCost(i)
    name = mc.GetFullRefinableObj().GetParamSetName(idx)
    print("%3d: LLK=%10.2f, name=%s"%(idx, cost, name))
  0: LLK=  20862.00, name=Best Configuration
  1: LLK= 130619.00, name=Run #3
  2: LLK=  20862.00, name=Run #2
  3: LLK=  20948.00, name=Run #1

Restore a chosen solution (set of parameters)

Restoring a solution will also update the 3D crystal view above.

[13]:
p.plot(fig=None, diff=True)
mc.RestoreParamSet(3, update_display=True)

Save results to CIF and Fox (.xmlgz) formats

[14]:
# Save result so it can be opened by Fox
xml_cryst_file_save_global('result.xmlgz')
# Also export to the CIF format
c.CIFOutput("result.cif")