`np.loadtxt`

doesn't support loading from a nice old-fashioned `str`

. Luckily, `io.StringIO`

neatly solves this problem:

```
import io
import numpy as np
raw = """
# temperature, kappa, std
300,7.00000,1.03175
400,5.44444,0.76190
"""
with io.StringIO(raw) as f:
data = np.loadtxt(f, delimiter=",")
```

Being able to pretend strings are files is surprisingly useful, especially for testing file-based import/export routines!

]]>The Atomic Simulation Environment (`ase`

) is a python package providing common classes and various tools for describing and simulating atomistic systems. `i-PI`

is a python package in the same domain, but focused more on path-integral molecular dynamics. For the `gknet`

project, I'd like to run some simulations with `i-pi`

, but my backend is based on `ase`

. This note contains some pointers towards interfacing those two. (I'd recommend also checking out the documentation of `i-pi`

!)

For periodic systems, `i-pi`

internally works in a "canonical" coordinate frame where the first lattice vector points along the x axis, the second one lies in the x-y plane, and the third vector is arbitrary. (This makes the unit cell matrix triangular, which is convenient.)

Orienting the system in such a way can be achieved in `ase`

with:

```
def orient_atoms(atoms):
new = atoms.copy()
new.set_cell(atoms.cell.cellpar())
new.set_scaled_positions(atoms.get_scaled_positions())
return new
```

This exploits the fact that `ase`

uses the same convention as `i-pi`

to reconstruct a unit cell from `cellpar`

(the lengths and angles of the basis vectors).

In `i-pi`

, the basis vectors are the *columns* of a `cell`

matrix, whereas in `ase`

, the `cell`

contains those vectors as rows. To convert from `ase`

, one therefore has to use the transpose.

`i-pi`

supports the `.xyz`

format for input geometries. However, `xyz`

doesn't have a standard way to write down the basis vectors for periodic systems. In `i-pi`

, there are three options to specify the basis: `H`

, `abcABC`

, and `GENH`

.

For `H`

and `abcABC`

, `i-pi`

expects the system to be rotated "canonically", for `GENH`

this is waived, as the rotation is performed internally on import.

The `H`

mode expects the unit cell as the `.flatten()`

-ed version of the `i-pi`

-style `cell`

matrix, i.e.:

```
def get_cellh(atoms):
"""Get H-mode cell spec from a canonically-oriented atoms object."""
return atoms.get_cell().T.flatten()
```

Note the tranpose to convert from `ase`

row style. This format is also used in `input.xml`

to specify the unit cell for barostats, so it's useful to use.

The `abcABC`

mode expects the cell parameters:

```
def get_cellabc(atoms):
"""Get abcABC-mode cell spec from a canonically-oriented atoms object."""
return atoms.get_cell().cellpar()
```

Since the cell parameters don't depend on the orientation of the structure, no alignment needs to be performed here.

Finally, the `GENH`

mode expects the unit cell in row style (!). The `Atoms`

object doesn't have to be re-oriented:

```
def get_cellgenh(atoms):
"""Get GENH-mode cell spec from an atoms object."""
return atoms.get_cell().flatten()
```

However, since the re-orientation will be done anyway by `i-pi`

, I'd recommend starting with a properly oriented system, and not using this mode. Anecdotally, it seems that the `ase`

way results in less numerical noise in components of the positions that should be zero.

In all cases, the basis is represented as a comment line, formatted as:

```
# CELL($MODE): $SOME $NUMBERS cell{$UNIT} positions{$UNIT}
```

In other words, to get from `ase.Atoms`

to `i-pi`

-style `xyz`

, one needs to run one of:

```
def xyz_cellh(filename, atoms):
rotated = orient_atoms(atoms)
comment = f"# CELL(H): " + " ".join([f"{x:.5f}" for x in get_cellh(rotated)])
comment += r" cell{angstrom}"
comment += r" positions{angstrom}"
write(filename, rotated, format="xyz", comment=comment)
def xyz_cellabc(filename, atoms):
rotated = orient_atoms(atoms)
comment = f"# CELL(abcABC): " + " ".join(
[f"{x:.5f}" for x in get_cellabc(rotated)]
)
comment += r" cell{angstrom}"
comment += r" positions{angstrom}"
write(filename, rotated, format="xyz", comment=comment)
def xyz_cellgenh(filename, atoms):
comment = f"# CELL(GENH): " + " ".join(
[f"{x:.5f}" for x in get_cellgenh(atoms)]
)
comment += r" cell{angstrom}"
comment += r" positions{angstrom}"
write(filename, atoms, format="xyz", comment=comment)
```

The positions are written down canonically oriented in log files and are also wrapped back into the unit cell. As far as I can tell, the positions sent to calculators are not wrapped, so if one wants to compare for some reason, running `atoms.wrap()`

makes positions (mostly) comparable. There might still be some differences close to the unit cell boundaries, depending on tolerance settings.

`SocketClient`

If the stress is required, for instance for NPT, the client needs to be started with `client.run(atoms, use_stress=True)`

.

Please note: this document is very much a work-in-progress, and should be taken as such. If you find anything wrong just drop me a line at `mail@marcel.science`

, or let me know on twitter. Same if you find anything wonky with the website, it's all new!

Since I can't seem to fully internalise how `numpy`

advanced indexing works, here's the solution to a common indexing task, written out for future reference.

Given an array `a`

of dimension `[n, m, l]`

and an index array `idx`

of dimension `[n, k <= m]`

, I'd like to define a new array `b`

such that `b[i, j, :] = a[i, idx[i, j], :]`

.

Let's make an example and a naive solution:

```
import numpy as np
a = np.random.random((3, 3, 2))
idx = np.array([[1, 2], [0, 2], [0, 1]])
naive = np.zeros((*idx.shape, 2))
for i in range(idx.shape[0]):
for j in range(idx.shape[1]):
naive[i, j] = a[i, idx[i, j]]
```

The indexing-based solution, from this blog post is:

```
b = a[np.arange(a.shape[0])[:, None], idx, :]
# verify solution
np.testing.assert_array_equal(naive, b)
```

This also works with `torch`

tensors and seems reasonably fast. If I notice any performance problems, I'll update this post with a (hopefully) more efficient version...

This note is about a little programming difficulty I ran into recently. I'm putting it here so I can point people to it while trying to figure out if this is either (a) already known, (b) easy to solve with advanced indexing, (c) in fact not a real problem, or (d) actually hard.

**Update:** So, it appears that this is just a sparse matrix operation, and one way to solve it would be to transition everything to sparse matrices. The `scipy.sparse`

package would be quite useful for this, and should support the format described below with `coo_matrix`

. For my usecase, I briefly used a `numba`

implementation, given below, and then found a way to avoid having to use the transpose entirely. ðŸ˜… I'm leaving this note up in case anyone else runs into a similar problem!

Let's say we have a `numpy`

array `u`

with dimensions `[n, m, â€¦]`

, with `n>m`

. From now on we'll ignore the additional indices `â€¦`

in `u`

, as we're interested in operations on the first two indices.

The entries in `u`

correspond to the non-zero entries of an implicit, large `[n, n, â€¦]`

`numpy`

array `U`

. We know the following about `U`

: For each index pair `[i, k]`

, if `U[i, k]`

is nonzero, `U[k, i]`

is also nonzero, and for each `i`

there are at most `m`

non-zero entries `k`

in that row/column. Therefore, `u`

contains all the relevant information about the non-zero elements of `U`

. Each row in `m`

may be padded with zero entries at the end to make `u`

not ragged.

There is one more important twist: the entries in each row of `u`

are ordered in some arbitrary, essentially random way. To track this, we have an index array `idx`

with dimension `[n, m]`

. Each row in `i`

contains the column indices (`k`

above) in `U`

corresponding to the entries in `u`

at that position. The index `-1`

is reserved for padded entries. So for instance, if a row `i`

of `u`

is `[1, 2, 3]`

, corresponding to `[i, k3]`

, `[i, k1]`

, `[i, k2]`

in `U`

, then `idx[i] = [k3, k1, k2]`

.

Our task is to compute the "transpose" `t`

of `u`

, in the sense that `t`

has the same relationship to `transpose(U)`

as `u`

has to `U`

: if the `[i, j]`

entry of `u`

corresponds to `[i, k]`

in `U`

, then `[i, j]`

of `t`

should correspond to `[k, i]`

.

The entry `[i, j]`

of `u`

belongs in row `idx[i, j] = jj`

of the transpose. The difficulty is figuring out the *column* index: We need to know which index in `u[jj, :]`

belongs to `i`

. We'll call this "reverse" index `ii`

in the code.

For my particular usecase, we'll need to compute this transpose many times for different `u`

, but can expect `idx`

to stay the same: the values we are transposing might change, but not the underlying structure.

This is rather confusing, so here is an example.

```
u = np.array([[3, 0],
[2, 1],
[4, 0]], dtype=int)
idx = np.array([[1, -1],
[2, 0],
[1, -1]], dtype=int)
```

with the result of our `transpose`

:

```
t = [[1, 0],
[4, 3],
[2, 0]]
```

Here is a naive pure-python implementation.

```
def naive_transpose(u, idx):
t = np.zeros_like(u)
for i in range(u.shape[0]):
for j in range(u.shape[1]):
jj = idx[i, j]
if jj == -1:
break
for kk in range(u.shape[1]):
if idx[jj, ii] == i:
break
t[i, j] = u[jj, ii]
return t
```

Clearly, this is inefficient: We run through each row of the `idx`

array many times to search for a match. But it works!

`numba`

+ indexing solutionThis solution has two parts: First, we make an `[n, m, 2]`

transpose index array that contains `idx`

in the `[:, :, 0]`

entries, and the "reverse" indices in `[:, :, 1]`

. This part is implemented using `numba`

, which provides a just-in-time (`jit`

) compiler for a subset of `python`

, speeding up the `for`

loops considerably.

We then use some âœ¨ advanced indexing âœ¨ to collect the entries of `t`

out of `u`

based on the indices in our transpose index array.

This approach has the advantage of separating out the complicated index-finding part from the actual array operations: We'll only need to compute the transpose index array once, and then can re-use it for different `u`

s with the same structure.

```
import numba
import numpy as np
def get_transpose_idx(idx):
transpose = np.zeros((*idx.shape[:2], 2), dtype=int)
return _get_transpose_idx(idx.astype(int), transpose)
@numba.jit(nopython=True)
def _get_transpose_idx(idx, transpose):
n, m = idx.shape
for i in range(n):
for j in range(m):
# what entry are we inverting?
jj = idx[i, j]
if jj == -1:
ii = -1
else:
ii = find(idx[jj], i)
transpose[i, j] = [jj, ii]
return transpose
@numba.jit(nopython=True)
def find(array, target):
n = len(array)
for i in range(n):
if array[i] == target:
return i
return -1
```

(For better performance one needs to either find a way to make use of accumulated knowledge about "reverse" indices when going through `idx`

, or optimise the lookup, for example by switching to a `dict`

, where average lookup complexity is better.)

The transpose itself can be achieved via

```
t = u[t_idx[:, :, 0], t_idx[:, :, 1], ...]
```

adding `:`

in place of `...`

for additional dimensions of `u`

.

This approach should be reasonably fast, but please run your own benchmarks if you end up using it!

]]>