DongXu-ADDG commited on
Commit
bad8e4c
Β·
verified Β·
1 Parent(s): 8d1b836

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -8
README.md CHANGED
@@ -31,7 +31,6 @@ This repository is mainly intended for:
31
  - initialization for downstream molecular property prediction
32
  - autoregressive molecular string modeling research
33
 
34
- **Repository naming note.** In the paper, the improved fragment-link representation is called **FragLink**. In the current Hugging Face repository layout, the corresponding folders appear as **FragSeqV2**. Both names are kept below so that readers can map the paper terminology to the released files.
35
 
36
  ## Model Sources
37
 
@@ -63,7 +62,7 @@ The current repository layout contains checkpoints grouped by representation and
63
  - 278M
64
  - 650M
65
 
66
- ### FragSeqV2 *(FragLink in the paper)*
67
  - 1M
68
  - 4M
69
  - 16M
@@ -107,7 +106,7 @@ The paper studies five string representations:
107
  - **DeepSMILES**
108
  - **SAFE**
109
  - **FragSeq**
110
- - **FragLink** *(stored as `FragSeqV2` in the current repo layout)*
111
 
112
  ### Scaling Grid
113
 
@@ -158,21 +157,21 @@ Downstream transfer is evaluated on nine MoleculeNet benchmarks:
158
 
159
  | Task | Metric | Best Released Representation | Score |
160
  |---|---:|---|---:|
161
- | BACE | ROC-AUC ↑ | FragLink / FragSeqV2 | 89.7 |
162
  | HIV | ROC-AUC ↑ | SAFE* | 83.3 |
163
  | BBBP | ROC-AUC ↑ | DeepSMILES | 97.8 |
164
  | SIDER | ROC-AUC ↑ | FragSeq | 68.8 |
165
  | Tox21 | ROC-AUC ↑ | FragSeq | 83.7 |
166
  | ClinTox | ROC-AUC ↑ | SMILES / DeepSMILES | 99.8 |
167
  | ESOL | RMSE ↓ | DeepSMILES | 0.362 |
168
- | FreeSolv | RMSE ↓ | FragLink / FragSeqV2 | 1.095 |
169
- | Lipophilicity | RMSE ↓ | FragLink / FragSeqV2 | 0.593 |
170
 
171
  \* The paper notes that SAFE reaches the highest HIV score, but also points out that SAFE only covers about 83% of the original HIV test set in that comparison. For full context, please check the paper.
172
 
173
  ### Task-Level Takeaways
174
 
175
- - **FragLink / FragSeqV2** is especially strong on **BACE** and the **biophysics regression tasks**.
176
  - **SMILES** and **DeepSMILES** are strong on **HIV**, **BBBP**, and **ClinTox**.
177
  - **FragSeq** is particularly competitive on **SIDER** and **Tox21**.
178
  - There is **no single best representation for every downstream task**.
@@ -220,7 +219,7 @@ Examples:
220
  - `SMILES 152M`
221
  - `DeepSMILES 85M`
222
  - `FragSeq 43M`
223
- - `FragSeqV2 152M`
224
  - `SAFE 278M`
225
 
226
  Then load the selected checkpoint with the official codebase and the matching configuration.
 
31
  - initialization for downstream molecular property prediction
32
  - autoregressive molecular string modeling research
33
 
 
34
 
35
  ## Model Sources
36
 
 
62
  - 278M
63
  - 650M
64
 
65
+ ### FragLink
66
  - 1M
67
  - 4M
68
  - 16M
 
106
  - **DeepSMILES**
107
  - **SAFE**
108
  - **FragSeq**
109
+ - **FragLink**
110
 
111
  ### Scaling Grid
112
 
 
157
 
158
  | Task | Metric | Best Released Representation | Score |
159
  |---|---:|---|---:|
160
+ | BACE | ROC-AUC ↑ | FragLink | 89.7 |
161
  | HIV | ROC-AUC ↑ | SAFE* | 83.3 |
162
  | BBBP | ROC-AUC ↑ | DeepSMILES | 97.8 |
163
  | SIDER | ROC-AUC ↑ | FragSeq | 68.8 |
164
  | Tox21 | ROC-AUC ↑ | FragSeq | 83.7 |
165
  | ClinTox | ROC-AUC ↑ | SMILES / DeepSMILES | 99.8 |
166
  | ESOL | RMSE ↓ | DeepSMILES | 0.362 |
167
+ | FreeSolv | RMSE ↓ | FragLink | 1.095 |
168
+ | Lipophilicity | RMSE ↓ | FragLink | 0.593 |
169
 
170
  \* The paper notes that SAFE reaches the highest HIV score, but also points out that SAFE only covers about 83% of the original HIV test set in that comparison. For full context, please check the paper.
171
 
172
  ### Task-Level Takeaways
173
 
174
+ - **FragLink** is especially strong on **BACE** and the **biophysics regression tasks**.
175
  - **SMILES** and **DeepSMILES** are strong on **HIV**, **BBBP**, and **ClinTox**.
176
  - **FragSeq** is particularly competitive on **SIDER** and **Tox21**.
177
  - There is **no single best representation for every downstream task**.
 
219
  - `SMILES 152M`
220
  - `DeepSMILES 85M`
221
  - `FragSeq 43M`
222
+ - `FragLink 152M`
223
  - `SAFE 278M`
224
 
225
  Then load the selected checkpoint with the official codebase and the matching configuration.