Try to run this code:
sage: sr = mq.SR(4,4,4,8, aes_mode=True, star=True, allow_zero_inversions=True)
sage: F,s = sr.polynomial_system()
and wait for it to terminate (~17s on my 2.33Ghz system) in a fresh SAGE session. The second run takes only 2s.
I profiled this with hotshot like this:
sage: import hotshot
sage: filename = "pythongrind.prof"
sage: prof = hotshot.Profile(filename, lineevents=1)
sage: prof.run("sr.polynomial_system()")
<hotshot.Profile instance at 0x414c11ec>
sage: prof.close()
and converted the result to cachegrind/calltree format
hotshot2calltree -o cachegrind.out.42 pythongrind.prof
to inspect the result with kcachegrind. Apparently, both sr.round_polynomials and sr.key_schedule_polynomials call MatrixSpace.get_action_impl which in turn calls pushout which calls construction_tower. construction_tower creates *7164* polynomial rings and this ring construction takes up 85% of the entire runtime.
So apparently the most time is spent in coercion (which also explains the better runtime for the second run) and I believe this is due to a bug.