What makes a movie successful?
A question that movie producers and film studios have been dying to know the answer to.
But, what is "success"?
Success is a broad term that describes the idea of reaching a purpose or goal. Our goal is to know what makes a movie successful, but what does it mean to be successful?
We decided to merge the point of view of the public and that of the producers to answer this question. Therefore, we decided to combine the profit with the average ratings, giving a certain weight to the average votes. This can be translated mathematically as:
We are then here to see what affects this metric! Will the language where the movie is recorded, the languages that are spoken in the movie, or the presence of a famous actor be a factor? We will dig into each question and then in the end show you the result ;)
Scroll down to dig into this complicated but exciting world.
(Standardize(revenue) + Standardize(log(numVotes) * averageRating)) / 2
Are languages impactful on the success?
An initial question that we might have is "which languages lead to a more successful movie?". To approach this, we analysed the languages with the highest number of movies and plotted the ones with the highest success. We decided to choose a threshold to remove the movies which are not significant. For example, if we took into account all the datasets, we would have had the most successful language is Old English Language. However, this language appears only in three movies. Therefore we decided to show only the languages which have at least 50 movies. We can then draw the conclusion that in general movies that contain some spoken Arabic, German and French will be the one with more success.
Another interesting question to consider would be "How important are the number of languages spoken in a movie for its success?". The plot below answers this question, which shows that when the number of images increases, the likeliness of having success increases. We can also note that the standard deviation increases, this means that below 4 languages there is some certainty based on the historical data that you will have a movie less successful than 0.5, while when including more languages there are chances that the movie has success, but also that it does not!
What about the countries that appear in a movie?
This framework is very similar to the above, in this case, we portray the countries where the movie is recorded in its scenes. We can see that movies held in the United Kingdom are the ones more likely to be successful, whereas movies filmed in Canada are the least likely, out of all countries with over 50 films produced and in our datasets.
In order to get further insight we have displayed the distribution of the number of movies per country.
This shows that the mean value of the success of movies relative to the number of countries it includes is around zero. However, some insightful information is that actually when the movie is filmed in 2 or 3 different countries, then it is more likely for it to have success.
Which genres drive greater success?
Did you ever choose a movie because of its genre?
Furthermore, did you ever like a movie more because of this genre?
Let us try to anticipate your answer and that of your peers. In the following graph, the 9 most successful genres are displayed. We can certainly notice that the category which mostly leads to famous movies is fantasies, followed by the other that can be seen underneath. On the other hand, horror movies seem to have the lowest success, so we would advise you to think carefully before deciding on this genre.
Does runtime have any effect on a movie's success?
This is an interesting question, especially when designing a movie. As a producer, you often wonder if displaying the plot in a short period will increase the suspense in the events and will keep the watcher in constant attention. Also, you might think that having a long movie will better convey the moral of the story and will make it worth going to the cinema to watch such a movie. No worries, the data can tell us the truth.
With a first look at the Pearson's correlation between runtime and success, a value of 0.335 suggests that longer movies are actually preferable. But can we visualize this relation? Indeed, plotting a scatter plot of both features along with the regression line spotlights the positive correlation between runtime and movie success. With such a positive slope, we expect that having a short movie will aggravate its success, and making it longer will increase its success definitely. Well... not so definite. We can be confident with the former expectation as the tiny confidence intervals reveal. However, with longer runtimes, the increase in success becomes less confident as the confidence intervals start to widen more and more. But in all cases, when other movie features are "optimized", having a longer movie is more likely to be more successful!
What about the combination of genres and languages?
While reading the sections above, you might be wondering: "But maybe the categories are important because of some interaction between each other?". Then let’s assess this problem. In the first instance, we assume that the content of the movies will be more affected by the language which is talked about and the genres. One example you can think about is Gaelic in Lord of the rings, which is a fantasy movie. We have more information about the content of the movie and therefore can address it in a more efficient way the success it has.
In these two graphs, it is possible to see the mean success and the occurrence of the pair of languages and genres. We can notice that the pair mostly occurs is the English Language with Drama, with more or less 2500 movies. However, even if this pair is the one mostly adopted it is not true to be the one most successful. Indeed, probably the reason is that producing a movie in English about Drama may lead to very successful movies, but also to very little successful ones. Therefore, looking at the left column we can see that the languages are more unexpected and actually give us some information about the movie. For example, the pair at the lead of successfulness is Old English Language with Drama, which is maybe referring to a certain type of movie. This gives us an insight that, probably, the single features are important for success, but the interaction between each other is even more important.
Does the release date matter?
The figure is interesting as we can see 2 periods in which the mean success is positive. Those periods correspond to holidays (Summer and Christmas) with 1 month in advance. It can be explained by the fact that when a movie is released, it is diffused during around 1 month at the Cinema. So, this distribution of success can be due to the fact that people are going more to the cinema during holidays.
An analysis on actors
Now, let’s analyse movies with respect to their actors.
First, we can focus on the number of actors in a movie. Can it influence the success of a movie?
The plot above shows how the mean success of a movie has a positive correlation with the number of actors that appear in it. It's evident that monologues are not big hits, and while the number of actors is relatively small, there is a very strong positive relationship between more actors and a more successful movie. However, when there's a large number of actors, this can have both a very positive or a very negative impact on how the public perceives the movie. But the tendance is that the success increases with the number of actors.
It's also interesting to look at movies by the most dominant gender of their actors. And guess what? Over 2/3rds of movies have more male actors than female. Surprising?
But does this actually play a role in the success of movies? The plot below indicates that this is the case, with male-dominated movies having a higher success metric.
What about including famous vs. less well-known actors?
We found a list from IMDB that provides the most famous actors. We create a new parameter is_famous which evaluates for each movie if at least one actor is in the top 50 of the most famous actors. Indeed, the mean success is much higher when the movie contains a famous actor.
What effect do production companies have?
We took a list of major production companies from Wikipedia, to analyse whether movies produced by these film studios have higher success than average movies. Indeed, movies that were produced by a major film studio have much higher success metrics, which could be explained by the fact that they can also afford higher quality cameras, better actors, editors, writers, and more.
What about the subjects of movies?
To dig into the subjects of movies, we carried out a topic analysis. We trained a Latent Dirichlet Allocation (LDA) model on movies with a minimum success (success higher than 1) for each movie to classify our corpus of quotes into the most recurring topics. Here are clouds of words for each genre. Slide to see each cloud !
It is interesting to note that the notion of killings arises frequently. We can see if the presence of the lexical field of 'killing' has any influence on success. After applying some preprocessing on summary plots (stemming, stopword removal, lemmatization), we built a new parameter ‘is_kill’ which evaluates if the notion of ‘killing is present or not in a movie summary.
Wow, interesting, it is indeed influencing the success of the movie! It seems that people like to see someone dying...
Then, how curious are you to know the end of this story?
After having explained in all the possible facets the features that may be affecting success, we are now to the results part! Believe us if we tell you that we are as exited as you are...
In the figure on the right, there is the answer to all the hypotheses that you have imagined since now. We did a linear regression which allows us to understand which features are actually important and which are not.
In this graph, we added all the features we had and discarded, for example, the countries and some language dummies since we're not providing insightful information. Therefore we can see that the all features we show have a p-value < 0.05 and therefore are statistically significant to the regression. The only ones which are not are the year, some genres, and the languages we included.
However, the languages are important because we decided to include also the cross terms of the most successful pairs we got in the previous graph. This leads to very insightful parameters and allows us to say that the language combined with the genre gives us information and improves the model.
We also checked that the residuals of the distribution were normally distributed, and this is the case, therefore it's a sign of a good regression.
In the end, we, therefore, know the following things help movies' succeed:
Including famous actors
Being produced by a major film studio
Being released during holidays
Having less female actresses than male actors
The theme of killing being present
But, none of these will be as relevant as the right combination of the language and the genre!