# Calculate the mean of the sample
# We do not care too much about which temperature we got
# Just if we are in equilibrium
= np.fromiter((s[-1] for s in smpls), float).mean()
Tmu *= 2/un.kB/3
Tmu
# Get weights for this temperature
f'Target temperature T={Tmu:.2f} K')
plt.title(= get_sample_weights(smpls, Tmu, border=False, debug=True)
wgh, idx
# and now for actual target temperature T
f'Target temperature T={T:.2f} K')
plt.title(= get_sample_weights(smpls, T, border=False, debug=True) wgh, idx
Distribution reshape
This code is an implementation of the idea to replace crude weighting produced by the Metropolis-Hastings algorithm by weights derived directly from the PDF of the target distribution. This makes the acceptance ratio essentially 100% and allows for tricks like changing of the target distribution (e.g. shifting of the target temperature of the sample or even scanning of the temperature range using prior distribution generated using plan_T_scan
).
The implementation here, derives the weights directly from the PDF of the target distribution by assigning non-uniform bins to each data point and calculating change in CDF across each bin. The resulting data uses weighting in similar way as the M-H sampling does, but exploits the knowledge of the sample distribution and target PDF to make it more effective.
get_sample_weights
get_sample_weights (data, T, sigma_scale=1.0, border=False, debug=False)
*Generate data weights making the probability distribution of data
in sample format into normal distribution corresponding to temperature T
. The actual distribution traces the normal curve. Normally, no weight is given to the data outside the input data range. This may skew the resulting distribution and shift its parameters but avoids nasty border artefacts in the form of large weights given to border samples to account for the rest of the domain. You can switch this behavior on by passing border
parameter (not recommended).
Input
data
- list of samples generated by HECSS sampler ([n, i, x, f, e])T
- temperature in kelvinsigma_scale
- scaling factor for variance. Defaults to 1.0.border
- make border samples account for unrepresented part of domaindebug
- plot diagnostic plots
Output (weights, index)
- weights - array of weights in sorted energy order
- index - indexing array which sorts samples by energy*
make_sampling
make_sampling (data, T, sigma_scale=1.0, border=False, probTH=0.25, Nmul=4, N=None, nonzero_w=True, debug=False)
*Generate new sample with normal energy probability distribution corresponding to temperature T
and size of the system inferred from data. The output is generated by multiplying samples proportionally to the wegihts generated by get_sample_weights
and assuming the final dataset will be Nmul
times longer (or the length N
which takes precedence). If nonzero_w
is True
the data multiplyers in the range (probTH, 1) will be clamped to 1, to avoid losing low-probability data. This will obviously deform the distribution but may be beneficial for the interaction model constructed from the data. The data on output will be ordered in increasing energy and re-numbered. The configuration index is retained.
Input
data
- list of samples generated by HECSS sampler ([n, i, x, f, e])T
- temperature in kelvinsigma_scale
- scaling factor for variance. Defaults to 1.0.border
- make border samples account for unrepresented part of domainprobTH
- threshold for probability (N*weight) clamping.Nmul
- data multiplication factor. The lenght of output data will be approximatelyNmul*len(data)
.N
- approximate output data length. Takes precedence overNmul
nonzero_w
- prevent zero weights for data with weights in (probTH, 1) rangedebug
- plot diagnostic plots
Output
Weighted, sorted by energy and re-numbered samples as a new list. The data in the list is just reference-duplicated, not copied. Thus the data elements should not be modified. The format is the same as the data produced by HECSS
sampler.*
Target temperature control
If the amplitude of displacement is not tightly controlled by the width searching algorithm (w_search
parameter in the HECSS
constructor), the mean energy of the sample will be shifted from the target value. The distribution reshaping algorithm in make_sampling
may shift the result closer to the target but cannot create data out of thin air. This is illustrated in the example below. In first case (Section 1.1) we have forced the algorithm to create distribution at initial target temperature (T
), which produces very distorted sample.
In the second case (Section 1.2) the target temperature is selected well within the data domain (Tmu
which is actually the mean of the sample) which generates a well-balanced and uniform sampling of the target distribution, although at a slightly shifted temperature.
Shifted target
= make_sampling(smpls, T, probTH=0.25, Nmul=4,
wd =True, border=False, debug=True)
nonzero_w
plt.show()=True, show=False)
plot_stats(wd, T, sqrN# plt.savefig('AUX/hecss_post.pdf', bbox_inches='tight')
Target at the mean
= make_sampling(smpls, Tmu, probTH=0.25, Nmul=4,
wd =True, border=False, debug=True)
nonzero_w
plt.show()=True, show=False)
plot_stats(wd, Tmu, sqrN# plt.savefig('AUX/hecss_post.pdf', bbox_inches='tight')
Manual temperature scan
The ability to re-shape the sample makes it possible to scan the whole range of temperatures by adding samples from several temperatures and subsequently drawing from the whole set samples representing any target temperature in the covered range. The sampled temperatures may be selected manually or using automatic procedure from plan_T_scan
. Here we just select few temperatures and run the procedure to test the weighting procedure on the non-gaussian prior.
= {TT:hecss.sample(TT, N) for TT in (260, 300, 340)} uni
= {t:np.array([s[-1] for s in d]) for t, d in uni.items()}
e_dist = np.concatenate(tuple(e_dist.values()))
e_uni = []
usmp for s in uni.values():
+= s
usmp
for TT, ed in e_dist.items():
=40, label=f'T={TT}K', alpha=0.25, range=(e_uni.min(), e_uni.max()))
plt.hist(ed, bins=40, histtype='step', stacked=True, range=(e_uni.min(), e_uni.max()))
plt.hist(e_uni, bins; plt.legend()
= make_sampling(usmp, T=280, N=4*N, nonzero_w=True, debug=True)
wd print(len(usmp), len(wd))
='upper right', bbox_to_anchor=(1.0, 0.95))
plt.legend(loc;
plt.show()=True, show=False)
plot_stats(wd, sqrNf'AUX/T_fit_{T=:.0f}K.pdf', bbox_inches='tight') plt.savefig(
1500 2026