数模 · 2020年3月25日 0

数模-2020美赛C题-01

美赛距今已经过去快一个月了给我感觉,由于不太喜欢码字…以及休养生息、补第一课堂…拖到现在才开始动笔写总结。

c题原题如下:

Problem C: A Wealth of Data

In the online marketplace it created, Amazon provides customers with an opportunity to rate and review purchases. Individual ratings – called “star ratings” – allow purchasers to express their level of satisfaction with a product using a scale of 1 (low rated, low satisfaction) to 5 (highly rated, high satisfaction). Additionally, customers can submit text-based messages – called “reviews” – that express further opinions and information about the product. Other customers can submit ratings on these reviews as being helpful or not – called a “helpfulness rating” – towards assisting their own product purchasing decision. Companies use these data to gain insights into the markets in which they participate, the timing of that participation, and the potential success of product design feature choices.

Sunshine Company is planning to introduce and sell three new products in the online marketplace: a microwave oven, a baby pacifier, and a hair dryer. They have hired your team as consultants to identify key patterns, relationships, measures, and parameters in past customer-supplied ratings and reviews associated with other competing products to 1) inform their online sales strategy and 2) identify potentially important design features that would enhance product desirability. Sunshine Company has used data to inform sales strategies in the past, but they have not previously used this particular combination and type of data. Of particular interest to Sunshine Company are time-based patterns in these data, and whether they interact in ways that will help the company craft successful products.

To assist you, Sunshine’s data center has provided you with three data files for this project: hair_dryer.tsv, microwave.tsv, and pacifier.tsv. These data represent customer-supplied ratings and reviews for microwave ovens, baby pacifiers, and hair dryers sold in the Amazon marketplace over the time period(s) indicated in the data. A glossary of data label definitions is provided as well. THE DATA FILES PROVIDED CONTAIN THE ONLY DATA YOU SHOULD USE FOR THIS PROBLEM.

Requirements

  1. Analyze the three product data sets provided to identify, describe, and support with mathematical evidence, meaningful quantitative and/or qualitative patterns, relationships, measures, and parameters within and between star ratings, reviews, and helpfulness ratings that will help Sunshine Company succeed in their three new online marketplace product offerings.
  2. Use your analysis to address the following specific questions and requests from the Sunshine Company Marketing Director:
  3. Identify data measures based on ratings and reviews that are most informative for Sunshine Company to track, once their three products are placed on sale in the online marketplace.
  4. Identify and discuss time-based measures and patterns within each data set that might suggest that a product’s reputation is increasing or decreasing in the online marketplace.
  5. Determine combinations of text-based measure(s) and ratings-based measures that best indicate a potentially successful or failing product.
  6. Do specific star ratings incite more reviews? For example, are customers more likely to write some type of review after seeing a series of low star ratings? e. Are specific quality descriptors of text-based reviews such as ‘enthusiastic’, ‘disappointed’, and others, strongly associated with rating levels?
  7. Write a one- to two-page letter to the Marketing Director of Sunshine Company summarizing your team’s analysis and results. Include specific justification(s) for the result that your team most confidently recommends to the Marketing Director.

Your submission should consist of:

  • One-page Summary Sheet
  • Table of Contents
  • One- to Two-page Letter
  • Your solution of no more than 20 pages, for a maximum of 24 pages with your summary sheet, table of contents, and two-page letter.

Note: Reference List and any appendices do not count toward the page limit and should appear after your completed solution. You should not make use of unauthorized images and materials whose use is restricted by copyright laws. Ensure you cite the sources for your ideas and the materials used in your report.

Glossary

Helpfulness Rating: an indication of how valuable a particular product review is when making a decision whether or not to purchase that product.

Pacifier: a rubber or plastic soothing device, often nipple shaped, given to a baby to suck or bite on.

Review: a written evaluation of a product.

Star Rating: a score given in a system that allows people to rate a product with a number of stars.

Attachments: The Problem Datasets

Problem_C_Data.zip

The three data sets provided contain product user ratings and reviews extracted from the Amazon Customer Reviews Dataset thru Amazon Simple Storage Service (Amazon S3).

hair_dryer.tsv

microwave.tsv

pacifier.tsv

Data Set Definitions: Each row represents data partitioned into the following columns.

  • marketplace (string): 2 letter country code of the marketplace where the review was written.
  • customer_id (string): Random identifier that can be used to aggregate reviews written by a single author.
  • review_id (string): The unique ID of the review.
  • product_id (string): The unique Product ID the review pertains to.
  • product_parent (string): Random identifier that can be used to aggregate reviews for the same product.
  • product_title (string): Title of the product.
  • product_category (string): The major consumer category for the product.
  • star_rating (int): The 1-5 star rating of the review.
  • helpful_votes (int): Number of helpful votes.
  • total_votes (int): Number of total votes the review received.
  • vine (string): Customers are invited to become Amazon Vine Voices based on the trust that they have earned in the Amazon community for writing accurate and insightful reviews. Amazon provides Amazon Vine members with free copies of products that have been submitted to the program by vendors. Amazon doesn’t influence the opinions of Amazon Vine members, nor do they modify or edit reviews.
  • verified_purchase (string): A “Y” indicates Amazon verified that the person writing the review purchased the product at Amazon and didn’t receive the product at a deep discount.
  • review_headline (string): The title of the review.
  • review_body (string): The review text.
  • review_date (bigint): The date the review was written.

我想先不要就具体题目分析,先谈一下这次比赛后的一些体会。

我们的大致流程是这样的:

Day1:选题+第一题

Day2:我的大部分时间纠结在量化评价上…(说白了,数据预处理)

Day3:剩下的大部分题

Day4:剩下的两个小问,论文最终优化。

我觉得美赛需要注意的事情如下:

1、良好的沟通和交流

由于疫情的关系,隔离在家的我们,只能通过线上电话的形式进行沟通,幸好我们的小伙伴都是非常负责~没有咕咕咕的情况发生。但是即便如此,我认为沟通效率还应该有更好的提升,表达能力看似没那么重要,但是却能起到至关重要的作用;另外,团队如果有良好的磨合和配合,沟通效率自然就上来了。所以可以多模拟模拟。有时候,大家对一个题目的理解都是出自不同的想法,所以着手点和考虑范围都有很大的不同,这就是我们沟通要解决的问题。

2、完备的基础知识

同样,因为疫情的关系,我们的题目由6道变为3道…这样少了一般的选择余地,虽然可以说是在选择上为我们节省了时间,但是相应的对知识的完备性要求的也更加严格。

3、良好的身体状况

我由于近半年来的焦虑和压力,再加上半个月的紧张期末复习(每天3点睡,每天一顿饭…持续半个月),在回到家的时候身体也一直处于一个不太好的状态,虽然期间也在调整,但是还是虚的很。导致在四天作为编程队员的时间里,前两天感觉还好,也不知道是由于长时间研究量化导致的?抑或其他?后面的时间我长时间看电脑就会头晕恶心,甚至最后一晚凌晨没睡时,心慌恶心等各种不适症状。这导致了我比赛后开始喝中药调理身体(包括我码字的现在),身体至今也没有彻底恢复。以至于我在纠结要不要继续做数模比赛,还是到此为止。

其他的…我再想想昂…

作为一个编程队员,感觉就像是夹板中求生存(不知恰当与否),我们需要满足编程的需要,又要提供良好的图片数据给论文。强大的编程能力和搜索能力真的很重要。

先说这么多,码字有点累了…以上。