Upload README.md
Browse files
README.md
CHANGED
|
@@ -31,7 +31,6 @@ This repository is mainly intended for:
|
|
| 31 |
- initialization for downstream molecular property prediction
|
| 32 |
- autoregressive molecular string modeling research
|
| 33 |
|
| 34 |
-
**Repository naming note.** In the paper, the improved fragment-link representation is called **FragLink**. In the current Hugging Face repository layout, the corresponding folders appear as **FragSeqV2**. Both names are kept below so that readers can map the paper terminology to the released files.
|
| 35 |
|
| 36 |
## Model Sources
|
| 37 |
|
|
@@ -63,7 +62,7 @@ The current repository layout contains checkpoints grouped by representation and
|
|
| 63 |
- 278M
|
| 64 |
- 650M
|
| 65 |
|
| 66 |
-
###
|
| 67 |
- 1M
|
| 68 |
- 4M
|
| 69 |
- 16M
|
|
@@ -107,7 +106,7 @@ The paper studies five string representations:
|
|
| 107 |
- **DeepSMILES**
|
| 108 |
- **SAFE**
|
| 109 |
- **FragSeq**
|
| 110 |
-
- **FragLink**
|
| 111 |
|
| 112 |
### Scaling Grid
|
| 113 |
|
|
@@ -158,21 +157,21 @@ Downstream transfer is evaluated on nine MoleculeNet benchmarks:
|
|
| 158 |
|
| 159 |
| Task | Metric | Best Released Representation | Score |
|
| 160 |
|---|---:|---|---:|
|
| 161 |
-
| BACE | ROC-AUC β | FragLink
|
| 162 |
| HIV | ROC-AUC β | SAFE* | 83.3 |
|
| 163 |
| BBBP | ROC-AUC β | DeepSMILES | 97.8 |
|
| 164 |
| SIDER | ROC-AUC β | FragSeq | 68.8 |
|
| 165 |
| Tox21 | ROC-AUC β | FragSeq | 83.7 |
|
| 166 |
| ClinTox | ROC-AUC β | SMILES / DeepSMILES | 99.8 |
|
| 167 |
| ESOL | RMSE β | DeepSMILES | 0.362 |
|
| 168 |
-
| FreeSolv | RMSE β | FragLink
|
| 169 |
-
| Lipophilicity | RMSE β | FragLink
|
| 170 |
|
| 171 |
\* The paper notes that SAFE reaches the highest HIV score, but also points out that SAFE only covers about 83% of the original HIV test set in that comparison. For full context, please check the paper.
|
| 172 |
|
| 173 |
### Task-Level Takeaways
|
| 174 |
|
| 175 |
-
- **FragLink
|
| 176 |
- **SMILES** and **DeepSMILES** are strong on **HIV**, **BBBP**, and **ClinTox**.
|
| 177 |
- **FragSeq** is particularly competitive on **SIDER** and **Tox21**.
|
| 178 |
- There is **no single best representation for every downstream task**.
|
|
@@ -220,7 +219,7 @@ Examples:
|
|
| 220 |
- `SMILES 152M`
|
| 221 |
- `DeepSMILES 85M`
|
| 222 |
- `FragSeq 43M`
|
| 223 |
-
- `
|
| 224 |
- `SAFE 278M`
|
| 225 |
|
| 226 |
Then load the selected checkpoint with the official codebase and the matching configuration.
|
|
|
|
| 31 |
- initialization for downstream molecular property prediction
|
| 32 |
- autoregressive molecular string modeling research
|
| 33 |
|
|
|
|
| 34 |
|
| 35 |
## Model Sources
|
| 36 |
|
|
|
|
| 62 |
- 278M
|
| 63 |
- 650M
|
| 64 |
|
| 65 |
+
### FragLink
|
| 66 |
- 1M
|
| 67 |
- 4M
|
| 68 |
- 16M
|
|
|
|
| 106 |
- **DeepSMILES**
|
| 107 |
- **SAFE**
|
| 108 |
- **FragSeq**
|
| 109 |
+
- **FragLink**
|
| 110 |
|
| 111 |
### Scaling Grid
|
| 112 |
|
|
|
|
| 157 |
|
| 158 |
| Task | Metric | Best Released Representation | Score |
|
| 159 |
|---|---:|---|---:|
|
| 160 |
+
| BACE | ROC-AUC β | FragLink | 89.7 |
|
| 161 |
| HIV | ROC-AUC β | SAFE* | 83.3 |
|
| 162 |
| BBBP | ROC-AUC β | DeepSMILES | 97.8 |
|
| 163 |
| SIDER | ROC-AUC β | FragSeq | 68.8 |
|
| 164 |
| Tox21 | ROC-AUC β | FragSeq | 83.7 |
|
| 165 |
| ClinTox | ROC-AUC β | SMILES / DeepSMILES | 99.8 |
|
| 166 |
| ESOL | RMSE β | DeepSMILES | 0.362 |
|
| 167 |
+
| FreeSolv | RMSE β | FragLink | 1.095 |
|
| 168 |
+
| Lipophilicity | RMSE β | FragLink | 0.593 |
|
| 169 |
|
| 170 |
\* The paper notes that SAFE reaches the highest HIV score, but also points out that SAFE only covers about 83% of the original HIV test set in that comparison. For full context, please check the paper.
|
| 171 |
|
| 172 |
### Task-Level Takeaways
|
| 173 |
|
| 174 |
+
- **FragLink** is especially strong on **BACE** and the **biophysics regression tasks**.
|
| 175 |
- **SMILES** and **DeepSMILES** are strong on **HIV**, **BBBP**, and **ClinTox**.
|
| 176 |
- **FragSeq** is particularly competitive on **SIDER** and **Tox21**.
|
| 177 |
- There is **no single best representation for every downstream task**.
|
|
|
|
| 219 |
- `SMILES 152M`
|
| 220 |
- `DeepSMILES 85M`
|
| 221 |
- `FragSeq 43M`
|
| 222 |
+
- `FragLink 152M`
|
| 223 |
- `SAFE 278M`
|
| 224 |
|
| 225 |
Then load the selected checkpoint with the official codebase and the matching configuration.
|