Skills and Resources
Competitive teams are expected to possess foundational analytical competencies relevant to business and data analytics. Participants should have working knowledge of predictive analytics, data mining, or machine learning techniques, and the ability to use appropriate analytical tools (such as R, Python, SPSS, STATA, or other relevant software) to manage and analyze relatively large datasets. Teams should also be capable of transforming and preparing variables as needed and selecting suitable technologies and methodologies to effectively address the problem and generate actionable insights.
Team
Teams may consist of one or two participants. Solo entries are welcome, and two person teams are encouraged to collaborate and share responsibilities.
Protocol
Participants will receive an overview of the problem along with comprehensive documentation of the dataset, including a data dictionary. Teams will be expected to explore the dataset using both descriptive and inferential statistical methods, develop a predictive model, submit their final predictions as a single-column CSV file, and include a concise written report delivered in PDF format.
Submission will be evaluated based on the criteria below:
Descriptive statistical summary of the given dataset (10%)
Provides a clear and accurate descriptive statistical overview of the given data set, including appropriate measure of location, dispersion, and relevant visual summaries.
Inferential statistical analysis (20%)
Conducts suitable inferential statistical analyses on the dataset, including hypothesis testing, confidence intervals, or other relevant techniques, with correct interpretation of results.
Analytical Approach and Modeling Method (20%)
Presents a concise and well-structured overview of the analytical approach and modeling techniques employed, clearly outlining the workflow and key steps.
Rationale for Methodology Selection (20%)
Offers a clear and logical justification explaining why the chosen methodology or analytical techniques are appropriate for the given problem context and data characteristics.
Prediction Vector submission and Model Performance (30%)
Submits a complete and correctly formatted prediction vector for the test dataset. The accuracy and effectiveness of the predictions will be evaluated by the judges using predefined performance metrics.