In this talk I will present an introduction to the foundations of mathematical programming and optimization theory and review my work on algorithms provably convergent under weak problem assumptions and parallel algorithms suitable for large scale big data machine learning applications. Nonlinear continuous optimization is a mature and active field shown to be effective in solving problems arising from a myriad of applications, including engineering and data science. How wide a class of problems a particular algorithm is capable of solving depends on the algorithm formulation being able to take advantage of problem structure and geometric properties. Algorithms that converge reliably and quickly for a broad range of constrained medium-scale problems have been developed recently, advancing the state of the art. The age of big data necessitates the use of parallel architectures in the computation of an algorithm''s procedural steps. I will present a framework for using problem structure to quickly and reliably solve a large scale nonconvex optimization problem as would arise in machine learning.