Having built a bullet-proof k-fold analytics engine, we have begun the process of migrating it to a parallel computing framework. As the size of the datasets that Knowledge Leaps is processing has increased in terms of volume and quantity, switching to a parallel framework will add scalable improvements in speed and performance. While we had limited the number of cross validations (the k value) to a maximum of 10, we will be able to increase it further with a minimal increase in compute time and much improved accuracy calculations.
Adding parellel-ization to the batch data engineering functionality will also increase the data throughput of the application. Our aim is to deliver a 10X – 20X improvements data throughput on larger datasets.