Model output diversity drop

#125

by LLLMI - opened 7 days ago

Have you tried to compare results of preview1 and preview3? For the same prompt-seed combination preview1 give much more diverse results - different angles, better style understanding. Preview3 have less errors in average and slightly better logic, but that doesn't matter when it generates "photo for passport" each time. You should try to generate images with the same conditions for both models to understand what I'm talking about. The difference is obvious.

I may propose a solution to not loose progress of training. To make a node for blocks/layer marge -> mix a model that would generate good diversity without much quality loss -> try to train on this model for a short period of time and look at results. If they are good - continue as main branch. I know block merge can help SDXL models a lot, but I don't know how Anima structure looks like. It may be impossible, I know it.

Anyway, my message was about future model training direction. The current model becomes popular not because each new version much better than previous - it's because the model was good from the beginning and after each release more and more people give it a try. The price for images quality improvements was too big in my opinion. There are no reason to continue developing a model that would give the same result for the same prompt. That's my opinion.

synta

7 days ago

•

edited 7 days ago

More Chaos is genuine with early training stages. Nothing one can really do about it. In particular with preview1 I think the model was also more diversified by the weaker trained backbone, so more of the original cosmos predict data influenced the output. With further training and in particular the freezing of the LLM adapter and focused training on the backbone Anima began to mimic the biases of its new dataset which is primarily danbooru.

You could generate with preview1 and give it a second pass inpainting with preview3 to clean up the rough edges though. Avoiding quality tags is also helpful in many cases.

As for your training suggestion: That's what finetuners will do I guess. They will genuinely put together high quality images from other sources that are not available on danbooru and thusiintroduce more diversity and reset specific booru biases naturally.

alright-bibi

6 days ago

You could generate with preview1 and give it a second pass inpainting with preview3 to clean up the rough edges though. Avoiding quality tags is also helpful in many cases.

Seconding this. Lora I've trained with preview-2 worked only great with preview-2 unfortunately. So I used preview-2 as for my base generation in 768px and upscale it with preview-3 to 1280px. Preview-3 surprisingly does great job at 1280px.

You can always stick with the preview-n version and use higher versions to upscale / clean-up. I am still using preview-1 to generate certain styles of oekaki, it just works better in my opinion.

Comuse123

6 days ago

More Chaos is genuine with early training stages. Nothing one can really do about it. In particular with preview1 I think the model was also more diversified by the weaker trained backbone, so more of the original cosmos predict data influenced the output. With further training and in particular the freezing of the LLM adapter and focused training on the backbone Anima began to mimic the biases of its new dataset which is primarily danbooru.

You could generate with preview1 and give it a second pass inpainting with preview3 to clean up the rough edges though. Avoiding quality tags is also helpful in many cases.

As for your training suggestion: That's what finetuners will do I guess. They will genuinely put together high quality images from other sources that are not available on danbooru and thusiintroduce more diversity and reset specific booru biases naturally.

Just want to mention that all 3 preview versions have had the LLM adapter trained so far, we haven't gotten any versions where it was frozen yet. However it does look like the adapter was trained less from preview 2 to 3 than it was from 1 to 2 which kinda has the same effect anyways.

Outside of that, in my experience preview 3 seems to be a little worse at NL than it was in preview 1 or 2, and backgrounds seem to be ever so slightly less detailed. Besides that though other tags that weren't represented well seem to be better/less gacha now, so it's fine imo

Kihero

about 23 hours ago

People can always merge new models later. For me the base model should prioritize anatomy accuracy the most

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment