We compare our method to Inpaint-E (i.e. not using our inference-time coordinate blending algorithm) and our method when tr is set to T (i.e. coordinate blending is
performed
at every inference step). As is evident in the results these baselines struggle to convey the text prompt while also preserving identity in the edited part.
Input | Masked | Inpaint-E only | Ours tr = T | Ours | |
---|---|---|---|---|---|
the target's legs are straight
|
|
|
|
|
|
| |||||
the target has a thinner backrest
|
|
|
|
|
|
| |||||
the base is smaller
|
|
|
|
|
|
| |||||
the target's shade is closed
|
|
|
|
|
|
| |||||
it has a smaller top
|
|
|
|
|
|
| |||||
target has thick legs
|
|
|
|
|
|