I would make the lists that i would call the average lists firsts.
like empire, they arnt bad at anything and arnt great at anything, they are average.
you look at an army list in WMM that fits how empire is played, tweak it, and then you "set it in stone".
Now you have something to revolve around.
next you go out on a tangent, you take a list that is the extremest in one aspect.
Lets say, for the sake of argument, that woodelven gladeguard is the most allround powerful ranged unit in the game. By settings its rules we now have an average and a maximum we can now fit units in between those two.
we also get a unit that is the worst ranged unit (that still has ranged weaponry), and set it as minimum.
we do this with all aspects of the game and using warmaster FB as a loose guide when it comes to abilities and costs.
now we flesh out the army lists, those things not present in WMM are added as is in WMFB and noted that their abilities and value are to be playtested.
we now get a first run of armylists, now start the playtesting.
One thing that could be done is to set the rule that any WMM army should be able to face any of the new armies in a balanced game...
During playtesting notes are taken on how things perform, players are to cheat and break the system as much as possible to find loopholes and broken rules.
you slowly patch the game up and try to keep the patching simple.
also, when patching, do minimal change. take the way you want to change something, and then cut that change down to half.
and then hopefully in the end you have a balanced game.