Sorry, I've actually just figured it out. I put it into a storyboard and between Change Data Source and Play Video I need to put 120 ms difference. So, I guess it's something to do with loading.
Right now, I have it as following:
- 0ms - stop playing the video, change data source - data offset, play text animations
- 120ms - start playing videos
And it works as intended.