BC.Wang Welcome

用化学键Feature预测分子能量

2018-08-09
BCWang

阅读:


  • 利用化学结构式直接预测能量,可以采用分子字符串SMILES编码,将其转化为Graph,然后拆分成中心原子和周围化学键,根据中心原子周围N个原子编码成字符串feature,求取其hash,根据hash统计个数,然后将feature进行类似one hot的编码即可,比如CH3OH,N=1,则包含3个C-H的feature,一个CO和一个OH的feature,如果按照N=2,则还包含C-O-H的feature,CH3OH的feature向量即为3,1,1,1(每个值对应于前面的feature名称),则CH2的feature向量为2,0,0,0,CHO即为1,1,0,0,然后用这些向量预测能量
  • 在我的项目中提供了一种人工的编码方式,因为涉及到过渡态,所以没办法使用SMILES编码https://github.com/B-C-WANG/MoleculeFeatureToEnergyPredict
  • Dataset: Molecule (including transition states) and its energy from DFT.
  • Encode: Molecule is encoded by its bond and other useful information, see moleculeLib.py.
  • Encode to array: just like one-hot encode, if feature exists, set to 1, otherwise 0. MoleculeEncode
  • results: (test set is red, train set is green.)
  • Low accuracy but High scalability, can be used on MCTS, need improve.

Similar Posts

Comments

0