Well, thinking about it some more, this little cosmos is topped off with NYC, the most visually recognizable urban area in the world, thereby transporting the real world (a specific time and place we might idealize or at least apprehend its signfigance) right smack onto this constructed world. By that we are drawn in very efficiently. That's an intelligent idea.
Also, the breakdown of the visual construction guides us through the format of the tune. Melody build, expansion and colorful elaborations, back to melodic foundation, dissipation.
I think this little film describes to a degree what Pat Metheny talks about in that article quoted in that other thread, i.e.:
That's why, perhaps, we eventually get to this big picture of the NYC cluster as the 'top build'.
j