3 Dataset Overview, Preprocessing, and Features
3.1 Successful Companies Dataset and 3.2 Unsuccessful Companies Dataset
4 Model Training, Evaluation, and Portfolio Simulation and 4.1 Backtest
5 Other approaches
5.2 Founders ranking model and 5.3 Unicorn recommendation model
7 Further Research, References and Appendix
In this research, a company is deemed successful if it achieves one of three outcomes: Initial Public Offering (IPO), Acquisition (ACQ), or Unicorn status (UNIC), the latter being defined as a valuation exceeding $1 billion. To assemble a list of successful companies, we initially filtered for IPOs with valuations above $500M or funds raised over $100M, yielding 363 companies. For acquisitions, we applied filters to eliminate companies with a purchase price below the maximum amount of funds raised or under $100M, resulting in 833 companies. To select unicorns, we searched for companies with a valuation above $1 billion, utilizing both Crunchbase data and an additional table of verified unicorns, which led to a total of 1074 unicorns.
The final dataset contains a timeline of all crucial investment rounds leading to the success event (i.e., achieving unicorn status, IPO, or ACQ), with the index of this event specified in the success_round column. This approach ensures that the dataset accurately represents the history and progress of each successful company, facilitating effective analysis.
To supply the model with examples of ’unsuccessful’ companies, we collected a separate dataset. We excluded companies already present in the successful companies dataset by removing those that had IPO, ACQ, or UNIC flags. We also eliminated a considerable number of actual unicorns from the CrunchBase website [16] to avoid overlap. We excluded companies that have not attracted any rounds since 2016. Additionally, we excluded companies that are subsidiaries or parent companies of other entities. Furthermore, we used the jobs dataset to exclude companies that have hired employees since 2017.
Additionally, we applied extra filters to exclude companies with valuation above $100 million, as they reside in the "gray area" of companies that may not be clearly categorized as successful or unsuccessful. By applying these filters, we constructed a dataset comprising 32,760 companies, denoted by the label ’0’ for unsuccessful, and 1,989 companies, denoted by the label ’1’ for successful.