01: The paper makes use of statistical methods in identification of a solution for the NYke Shoe Company. The problem is to choose a shoe size regardless of the gender and height. The paper considers the appropriate statistical methods to solve the problem and is giving the recommendations based on the results.


Introduction: 1

Data: 2

Analysis & Results: 3

Conclusions: 5

Appendix: 5

Table 1: Descriptive statistics for Shoe Size and Height 5

Table 2: Descriptive statistics for Gender. 5

Table 3: Regression analysis considering Shoe Size as dependent variable. 6

Table 4: 95% Confidence interval for Shoe Size 6

Table 5: Counts for Shoe Size 7

Graph 1: Histogram and Box plot for Shoe Size 7

Graph 2: Histogram and Box plot for Height 7

Graph 3: Scatter plot of Height vs Shoe Size and Sex vs Shoe Size 8


Statistics and data analysis are topics of high importance and are required to be applied in different areas of real life for appropriate solutions to existing problems to be found. Business information systems are performing statistical analysis and related operations for the purposes of advanced data analysis and making decisions which are both efficient and significant. Companies are using these techniques for purposes of maximization of revenue and profit. Therefore, statistical methods and analysis are important in analysis of existing data and the results from the data can be used in making recommendations on appropriate courses of action to take for better positioning within the target markets. Statistics is also dependent on the existing data, therefore, it cannot be dismissed when used in forecasting.

Nyke Shoe Company are undergoing financial difficulties and are looking into making a single size of shoe regardless of demographics of customers such as gender, or height. This decision is intended to reduce the costs associated with production since in manufacturing different sizes of shoes, there are processes undergone such as measurement of and cutting of products into the required lengths and them being fitted onto different soles and this increases costs of production and the time spent in creation of products. However, if the company is able to find a single size that works for a larger group, then this can help them recover from their financial downturn and they can minimize the costs associated with production without a loss in sales as well as profits increasing. This solution will help in the overall financial problem being eliminated. There is need for statistical analysis to be done on the data to determine the shoe size they can focus on to help in solving their current problem.


The dataset given from the company has 3 variables and 35 data points as the points for carrying out the analysis. The variables are the ‘Shoe Size’, ‘Height’ and ‘Gender’ of the customers and the sizes and height are given in a discreet form in terms of them being integers, however, for advanced analysis, the values need to be continuous. The target of the statistical processes being applied is to discover the shoe size which meets the interests of the company and from the analysis being carried out, conclusions can be made. The dependent variables in this case is the shoe size while the independent variables are the gender and height. Additionally, gender is a variable which is qualitative and in order for the statistical analysis to be carried out, it needs to be quantitative in nature. Therefore, the following dummy variable will be created to solve the problem.

Sex = 1 if Gender = Female

= 0 if Gender = Male

The above variables have been used to derive the analysis that follows using the dummy variable newly defined.

Analysis & Results

Prior to working on a detailed analysis, there is need for exploration of the data. From my perspective, the best solution would be analysis of the descriptive demographics of the data. The tables which were obtained from this statistical analysis are in the Appendix and labelled as tables Table 1 and Table 2 respectively. The two tables give insights on the basic demographics of the data and from the data set, there is ability to derive the mean, mode and median for the different variables. The variability of the values gives insights that it is more present in the ‘Shoe Size’ variable as compared to the ‘Height’ variable with these being achieved through comparison of the respective mean values.

The mean mode and median for ‘Shoe Size’ respectively are 9.1429, 9, and 7.

The mean mode and median for ‘Height’ respectively are 68.9429, 70, and 70.

The second table, Table 2, shows that shoes made for females are almost 50% of production and the other 50% for males. The deviation of 51.43% from 50% is insignificantly small therefore the need for consideration of the genders for a better analysis of the results to be carried out. To better interpret and understand the data, there is need for an analysis f the Histogram and Box-Plots which are corresponding. The output obtained from this analysis is presented in the Appendix as Graphs 1 and 2. These graphs give insights on the variables ‘Shoe Size’ and ‘Height’ and conclusions can be made that their distributions are not normal. From the box plot, we see that there is a slight positive skew on the distribution of ‘Shoe Size’ with this being further supported by the coefficient of skewness in Table 1. The distribution of ‘Height’ is notably negatively skewed with evidences of this conclusion being drawn from both the table and the chart.

The next step to take is to draw the scatter plots and these resulted in the graphs Graph 3 and Graph 4 in the appendix. In the construction of these, the ‘Shoe Size’ was used as the dependent variable. These graphs show that the shoe size is variant across genders and heights. The graphs show the variations in height and sex of the customers and conclusions are that generally, people who are larger need a larger shoe size and also that males are in need of larger shoe sizes. From these statistics, there are expectations of relationships to exist between the independent variables. Running a regression analysis on this data returns a result of 3.

The regression analysis validates the existing doubts such as the shoe size being affected by variables such as gender and height. This, therefore, proofs that a standardized shoe size which would fit all is impossible. However, the company in focus cannot be able to cater for production costs of all sizes, therefore, there is need for them to find a central point to work on for maximum sales and profits. The best solution would be applying a point estimate alongside a confidence interval. The results are displayed in table 4.1 and the confidence interval is 95%.

The confidence interval is (8.256, 10.030)

This value, however, fails in giving proper results since Table 5 shows that only 5 from the range of 35 is included in this interval. The maximum shoe size is 7 with reference to table 5. Therefore, the test fails in giving best results since 7 becomes the usable size for the best outcome.


The initial outcome shows that both gender and height impact the size of the shoe significantly, therefore, there is no standard single shoe size which can be accepted by all genders and heights. This, therefore, makes it impossible to get a perfect result and maintain revenue. However, the best recommendation after carrying out the tests is 7 since this offers maximum efficiency. Therefore, size 7 shoes are recommended for production.


Table 1: Descriptive statistics for Shoe Size and Height

Table 2: Descriptive statistics for Gender.

Table 3: Regression analysis considering Shoe Size as dependent variable.








Significance F



















Standard Error

t Stat


Lower 95%

Upper 95%






















Table 4: 95% Confidence interval for Shoe Size

One-Sample T: Shoe Size

Variable N Mean StDev SE Mean 95% CI

Shoe Size 35 9.143 2.583 0.437 (8.256, 10.030)

Table 5: Counts for Shoe Size

Graph 1: Histogram and Box plot for Shoe Size

Graph 2: Histogram and Box plot for Height

Graph 3: Scatter plot of Height vs Shoe Size and Sex vs Shoe Size