We compare our method to Inpaint-E (i.e. not using our inference-time coordinate blending algorithm) and our method when tr is set to T (i.e. coordinate blending is
performed
at every inference step). As is evident in the results these baselines struggle to convey the text prompt while also preserving identity in the edited part.
| Input | Masked | Inpaint-E only | Ours tr = T | Ours | |
|---|---|---|---|---|---|
|
the target's legs are straight
|
|
|
|
|
|
| | |||||
|
the target has a thinner backrest
|
|
|
|
|
|
| | |||||
|
the base is smaller
|
|
|
|
|
|
| | |||||
|
the target's shade is closed
|
|
|
|
|
|
| | |||||
|
it has a smaller top
|
|
|
|
|
|
| | |||||
|
target has thick legs
|
|
|
|
|
|