<![CDATA[marcel's notes]]> 2022-11-16T00:00:00Z https://notes.marcel.science/ notes.marcel.science 2022-11-16T00:00:00Z https://notes.marcel.science/2022/jaxmd_nl <![CDATA[JAX-MD Neighbourlists in Non-Orthorhombic Systems]]> 2022-11-16

# JAX-MD Neighbourlists in Non-Orthorhombic Systems

I've written a quick colab notebook to point out that in non-orthorhombic systems, one needs to be slightly careful when using the `jax-md` neighbourlist implementation. The implementation works in fractional coordinates, and in systems where the lattice vectors are not orthogonal, the corresponding bins in real space are not rectangular, but rather skewed. So the bin sizes must take the distance between boundaries of the real space bins, rather then edge lengths, into account...

This is just a preliminary consideration -- the real fun part is thinking about this when both positions and cell can change between steps.

]]>
2022-07-12T00:00:00Z https://notes.marcel.science/2022/logscale <![CDATA[Removing ticks in log scale]]> 2022-07-12

# Removing ticks in log scale

It appears that `matplotlib` at some point changed the default behaviour of log-scale plots to place labelled minor ticks at every power of the base. This means that if you override the standard ticks with `ax.set_xticks`, they will likely collide with those minor ticks, which need to be removed with `ax.set_xticks([], minor=True)`. Most StackOverflow answers that are easily googled pre-date this change, which confused me for a good half hour!

]]>
2022-04-26T00:00:00Z https://notes.marcel.science/2022/stress+painn <![CDATA[Stress and PaiNN]]> 2022-04-26

# Stress and PaiNN

Just for future reference, here's a gist that goes through various different ways of computing the stress with autodiff in the `dev` branch of `schnetpack`. The expected output (once the straining of the unit cell is fixed), is:

``````strain
[[[ -2.6313298  -2.980155   -8.266606 ]
[ -2.980213   -5.880828  -13.080657 ]
[ -8.266858  -13.08056   -33.894905 ]]]

strain_compute_offsets
[[[ -2.6313148  -2.9801168  -8.266688 ]
[ -2.9801168  -5.8808103 -13.080613 ]
[ -8.266622  -13.080922  -33.894863 ]]]

strain_rij
[[[ -2.6312795  -2.9801457  -8.266726 ]
[ -2.9801457  -5.8808045 -13.080628 ]
[ -8.266726  -13.080628  -33.894817 ]]]

rij
[[[ -2.6312802  -2.9801457  -8.266726 ]
[ -2.9801457  -5.880803  -13.0806265]
[ -8.266726  -13.0806265 -33.89482  ]]]

ase
[[ -2.63129346  -2.98017313  -8.2668025 ]
[ -2.98017313  -5.88085113 -13.08076411]
[ -8.2668025  -13.08076411 -33.89520126]]
``````

It's interesting to observe that only the approaches that explicitly use the atom-pair vectors get a perfectly symmetric stress, everything else has some numerical noise.

]]>
2021-05-20T00:00:00Z https://notes.marcel.science/2021/idx-with-array <![CDATA[Indexing arrays with other arrays]]> 2021-05-20

# Indexing arrays with other arrays

Since I can't seem to fully internalise how `numpy` advanced indexing works, here's the solution to a common indexing task, written out for future reference.

Given an array `a` of dimension `[n, m, l]` and an index array `idx` of dimension `[n, k <= m]`, I'd like to define a new array `b` such that `b[i, j, :] = a[i, idx[i, j], :]`.

### Example

Let's make an example and a naive solution:

``````import numpy as np

a = np.random.random((3, 3, 2))

idx = np.array([[1, 2], [0, 2], [0, 1]])

naive = np.zeros((*idx.shape, 2))
for i in range(idx.shape):
for j in range(idx.shape):
naive[i, j] = a[i, idx[i, j]]
``````

### Solution

The indexing-based solution, from this blog post is:

``````b = a[np.arange(a.shape)[:, None], idx, :]

# verify solution
np.testing.assert_array_equal(naive, b)
``````

This also works with `torch` tensors and seems reasonably fast. If I notice any performance problems, I'll update this post with a (hopefully) more efficient version...

]]>
2021-04-12T00:00:00Z https://notes.marcel.science/2021/strange-transpose <![CDATA[Transposing a 'sparse', randomly-ordered array]]> 2021-04-12

# Transposing a 'sparse', randomly-ordered array

This note is about a little programming difficulty I ran into recently. I'm putting it here so I can point people to it while trying to figure out if this is either (a) already known, (b) easy to solve with advanced indexing, (c) in fact not a real problem, or (d) actually hard.

Update: So, it appears that this is just a sparse matrix operation, and one way to solve it would be to transition everything to sparse matrices. The `scipy.sparse` package would be quite useful for this, and should support the format described below with `coo_matrix`. For my usecase, I briefly used a `numba` implementation, given below, and then found a way to avoid having to use the transpose entirely. 😅 I'm leaving this note up in case anyone else runs into a similar problem!

Let's say we have a `numpy` array `u` with dimensions `[n, m, …]`, with `n>m`. From now on we'll ignore the additional indices `…` in `u`, as we're interested in operations on the first two indices.

The entries in `u` correspond to the non-zero entries of an implicit, large `[n, n, …]` `numpy` array `U`. We know the following about `U`: For each index pair `[i, k]`, if `U[i, k]` is nonzero, `U[k, i]` is also nonzero, and for each `i` there are at most `m` non-zero entries `k` in that row/column. Therefore, `u` contains all the relevant information about the non-zero elements of `U`. Each row in `m` may be padded with zero entries at the end to make `u` not ragged.

There is one more important twist: the entries in each row of `u` are ordered in some arbitrary, essentially random way. To track this, we have an index array `idx` with dimension `[n, m]`. Each row in `i` contains the column indices (`k` above) in `U` corresponding to the entries in `u` at that position. The index `-1` is reserved for padded entries. So for instance, if a row `i` of `u` is `[1, 2, 3]`, corresponding to `[i, k3]`, `[i, k1]`, `[i, k2]` in `U`, then `idx[i] = [k3, k1, k2]`.

Our task is to compute the "transpose" `t` of `u`, in the sense that `t` has the same relationship to `transpose(U)` as `u` has to `U`: if the `[i, j]` entry of `u` corresponds to `[i, k]` in `U`, then `[i, j]` of `t` should correspond to `[k, i]`.

The entry `[i, j]` of `u` belongs in row `idx[i, j] = jj` of the transpose. The difficulty is figuring out the column index: We need to know which index in `u[jj, :]` belongs to `i`. We'll call this "reverse" index `ii` in the code.

For my particular usecase, we'll need to compute this transpose many times for different `u`, but can expect `idx` to stay the same: the values we are transposing might change, but not the underlying structure.

### Example

This is rather confusing, so here is an example.

``````u = np.array([[3, 0],
[2, 1],
[4, 0]], dtype=int)
idx = np.array([[1, -1],
[2, 0],
[1, -1]], dtype=int)
``````

with the result of our `transpose`:

``````t = [[1, 0],
[4, 3],
[2, 0]]
``````

## Naive solution

Here is a naive pure-python implementation.

``````def naive_transpose(u, idx):
t = np.zeros_like(u)
for i in range(u.shape):
for j in range(u.shape):
jj = idx[i, j]
if jj == -1:
break

for kk in range(u.shape):
if idx[jj, ii] == i:
break

t[i, j] = u[jj, ii]

return t
``````

Clearly, this is inefficient: We run through each row of the `idx` array many times to search for a match. But it works!

## `numba` + indexing solution

This solution has two parts: First, we make an `[n, m, 2]` transpose index array that contains `idx` in the `[:, :, 0]` entries, and the "reverse" indices in `[:, :, 1]`. This part is implemented using `numba`, which provides a just-in-time (`jit`) compiler for a subset of `python`, speeding up the `for` loops considerably.

We then use some ✨ advanced indexing ✨ to collect the entries of `t` out of `u` based on the indices in our transpose index array.

This approach has the advantage of separating out the complicated index-finding part from the actual array operations: We'll only need to compute the transpose index array once, and then can re-use it for different `u`s with the same structure.

``````import numba
import numpy as np

def get_transpose_idx(idx):
transpose = np.zeros((*idx.shape[:2], 2), dtype=int)
return _get_transpose_idx(idx.astype(int), transpose)

@numba.jit(nopython=True)
def _get_transpose_idx(idx, transpose):
n, m = idx.shape

for i in range(n):
for j in range(m):
# what entry are we inverting?
jj = idx[i, j]
if jj == -1:
ii = -1
else:
ii = find(idx[jj], i)
transpose[i, j] = [jj, ii]

return transpose

@numba.jit(nopython=True)
def find(array, target):
n = len(array)
for i in range(n):
if array[i] == target:
return i
return -1

``````

(For better performance one needs to either find a way to make use of accumulated knowledge about "reverse" indices when going through `idx`, or optimise the lookup, for example by switching to a `dict`, where average lookup complexity is better.)

The transpose itself can be achieved via

``````t = u[t_idx[:, :, 0], t_idx[:, :, 1], ...]
``````

adding `:` in place of `...` for additional dimensions of `u`.

This approach should be reasonably fast, but please run your own benchmarks if you end up using it!

]]>
2021-03-28T00:00:00Z https://notes.marcel.science/2021/ase-ipi-xyz <![CDATA[Interfacing ase and i-pi]]> 2021-03-28

# Interfacing ase and i-pi

### Context

The Atomic Simulation Environment (`ase`) is a python package providing common classes and various tools for describing and simulating atomistic systems. `i-PI` is a python package in the same domain, but focused more on path-integral molecular dynamics. For the `gknet` project, I'd like to run some simulations with `i-pi`, but my backend is based on `ase`. This note contains some pointers towards interfacing those two. (I'd recommend also checking out the documentation of `i-pi`!)

## Coordinate system

For periodic systems, `i-pi` internally works in a "canonical" coordinate frame where the first lattice vector points along the x axis, the second one lies in the x-y plane, and the third vector is arbitrary. (This makes the unit cell matrix triangular, which is convenient.)

Orienting the system in such a way can be achieved in `ase` with:

``````def orient_atoms(atoms):
new = atoms.copy()
new.set_cell(atoms.cell.cellpar())
new.set_scaled_positions(atoms.get_scaled_positions())

return new
``````

This exploits the fact that `ase` uses the same convention as `i-pi` to reconstruct a unit cell from `cellpar` (the lengths and angles of the basis vectors).

## Notations for unit cell matrix

In `i-pi`, the basis vectors are the columns of a `cell` matrix, whereas in `ase`, the `cell` contains those vectors as rows. To convert from `ase`, one therefore has to use the transpose.

## Input format

`i-pi` supports the `.xyz` format for input geometries. However, `xyz` doesn't have a standard way to write down the basis vectors for periodic systems. In `i-pi`, there are three options to specify the basis: `H`, `abcABC`, and `GENH`.

For `H` and `abcABC`, `i-pi` expects the system to be rotated "canonically", for `GENH` this is waived, as the rotation is performed internally on import.

The `H` mode expects the unit cell as the `.flatten()`-ed version of the `i-pi`-style `cell` matrix, i.e.:

``````def get_cellh(atoms):
"""Get H-mode cell spec from a canonically-oriented atoms object."""
return atoms.get_cell().T.flatten()
``````

Note the tranpose to convert from `ase` row style. This format is also used in `input.xml` to specify the unit cell for barostats, so it's useful to use.

The `abcABC` mode expects the cell parameters:

``````def get_cellabc(atoms):
"""Get abcABC-mode cell spec from a canonically-oriented atoms object."""
return atoms.get_cell().cellpar()
``````

Since the cell parameters don't depend on the orientation of the structure, no alignment needs to be performed here.

Finally, the `GENH` mode expects the unit cell in row style (!). The `Atoms` object doesn't have to be re-oriented:

``````def get_cellgenh(atoms):
"""Get GENH-mode cell spec from an atoms object."""
return atoms.get_cell().flatten()

``````

However, since the re-orientation will be done anyway by `i-pi`, I'd recommend starting with a properly oriented system, and not using this mode. Anecdotally, it seems that the `ase` way results in less numerical noise in components of the positions that should be zero.

In all cases, the basis is represented as a comment line, formatted as:

``````# CELL(\$MODE): \$SOME \$NUMBERS cell{\$UNIT} positions{\$UNIT}
``````

In other words, to get from `ase.Atoms` to `i-pi`-style `xyz`, one needs to run one of:

``````
def xyz_cellh(filename, atoms):
rotated = orient_atoms(atoms)
comment = f"# CELL(H):     " + "     ".join([f"{x:.5f}" for x in get_cellh(rotated)])
comment += r" cell{angstrom}"
comment += r" positions{angstrom}"

write(filename, rotated, format="xyz", comment=comment)

def xyz_cellabc(filename, atoms):
rotated = orient_atoms(atoms)
comment = f"# CELL(abcABC):     " + "     ".join(
[f"{x:.5f}" for x in get_cellabc(rotated)]
)
comment += r" cell{angstrom}"
comment += r" positions{angstrom}"

write(filename, rotated, format="xyz", comment=comment)

def xyz_cellgenh(filename, atoms):
comment = f"# CELL(GENH):     " + "     ".join(
[f"{x:.5f}" for x in get_cellgenh(atoms)]
)
comment += r" cell{angstrom}"
comment += r" positions{angstrom}"

write(filename, atoms, format="xyz", comment=comment)

``````

## Small gotchas

### Logging

The positions are written down canonically oriented in log files and are also wrapped back into the unit cell. As far as I can tell, the positions sent to calculators are not wrapped, so if one wants to compare for some reason, running `atoms.wrap()` makes positions (mostly) comparable. There might still be some differences close to the unit cell boundaries, depending on tolerance settings.

### Stress with `SocketClient`

If the stress is required, for instance for NPT, the client needs to be started with `client.run(atoms, use_stress=True)`.

Please note: this document is very much a work-in-progress, and should be taken as such. If you find anything wrong just drop me a line at `mail@marcel.science`, or let me know on twitter. Same if you find anything wonky with the website, it's all new!

]]>

`np.loadtxt` doesn't support loading from a nice old-fashioned `str`. Luckily, `io.StringIO` neatly solves this problem:

``````import io
import numpy as np

raw = """
# temperature, kappa, std
300,7.00000,1.03175
400,5.44444,0.76190
"""

with io.StringIO(raw) as f: