‘What Women Want’ – Saree Buyology

There are over 80 ways of draping a saree. These are mostly styles based on the region. And with the region come the designs and the material. So, the permutations and combinations that one has to go through to choose “one perfect saree” are numerous. So, we set out to understand ‘What Women Want’ while choosing a saree. This was to better understand the social, economical and cultural perceptions towards the saree today and to attempt a solution using new technologies like AI & Vision computing to make the experience of buying that “One perfect saree” memorable, social and hassle-free.

Sarees are one of the oldest clothing articles on the face of the earth. It pre-dates most of the clothing cultures we now have. This traces back to around 2000 – 1800 bc (https://en.wikipedia.org/wiki/Sari). Sarees have not just lasted so long but have also modernized in trends and design with time. Even the manufacturing and the sale of sarees has gotten more sophisticated over time. So, you might think choosing a saree should be just as simple as choosing a shirt/trouser. It’s just not that simple.

Understanding Who Wear Sarees Today

The NY times, claims saree draping to be a nationalist agenda (https://www.nytimes.com/2017/11/12/fashion/india-nationalism-sari.html) but as stated before, saree draping predates all of this. So, we decided to take the subcontinent into consideration, on multiple factors and with information from surveys and trends our findings on who wears sarees can be seen as below:

  • Religion divide
    A survey by the NSSO states that saree is not just a hindu attire but christians and muslim households spend considerable share of women’s clothing budget on sarees
  • Economic divide
    Saree breaches the class divide. The effluent class’s saree buying is at 77% which is only slightly higher than the bottom class’s 72%.

Understanding What Women Look For In A Saree

From our earlier understanding that most southern states in the subcontinent favor sarees than the northern ones, we conducted a survey with a small sample size of women, Majorly from tier 2 and tier 3 towns to understand what do they look for in a saree.

who is interested in sarees

In the towns, it’s a growing trend that the majority of women in their 20’s are preferring to move to the cities for work and education. The women above 50’s in the towns are parents to the children who are moving to the cities. So, Majority of sarees are being purchased by women who are over 40 years old.

what design are they looking for

The graph states that the majority of people looking to buy a saree are always looking for thread works on their sarees. This could be because it is easier to maintain than stones and is not as simple as checks and prints. The prints fall second but lag by a fair margin.

Is Design Everything

In every other branding that we see for sarees, we see all the bells and whistles. The shiny rocks on the saree, the glossy silk, the simple prints, but what are people actually looking for?

Designs of the saree seem to be the least important when it comes to preferences, The feel, the material and the quality are what are looked for. In other terms, longevity and usability of the sarees are important than the design. Also, 80% of women choosing design fall between 20-35 years.

Apart from the above insights we have also discovered the following:

  • Gifting a saree is very common in india
  • Majority of women buying sarees buy it for everyday usage and the purchase of fancy sarees is for special occasions where all the classes tend to spend more than usual for that one saree.

So, now that we know what women look for in a saree, Lets look at their buying behavior.

Understanding How They Shop For Sarees

A process of buying starts in the minds of the consumer, which leads to the finding of alternatives between products that can be acquired with their relative advantages and disadvantages. From earlier, we know that the quality factor prevails in the first position, color and design, comfort and style and price are securing successive ranks respectively.

From more surveys and interviews, we understand the general shopping patterns. 

The graph here shows how the market has been growing with larger name brands across all the classes scaling ethnic wear in india. This also shows how affordable ethnic brands are in comparison to western wear.

(Source: Technopak, Wazir Advisors, Equity research and Avendus analysis)

The saree market in india is one of the largest apparel market in the country. There is a significant shift away from traditional sarees towards ethnic wear and western wear. Though the growth seems to be slower for sarees, it still would be the market leader in time to come.

The Influencers

  • Increasing number of occasions

With the growing social boundaries, the number of occasions have increased in india. Formal, informal and traditional occasions have made women increase their wardrobe.

  • Impulsive buying

With offers everywhere and the technology being present in your palm, attraction towards any commodity has fueled impulsive buying for the average indian. 

  • Influence of media

Soaps, Movies, Ads, Social media, Personal messages. The visual format of content sharing is enabling users with millions of options and is contributing towards this change in behaviour 

  • Increase in fashion sense

With the evolving fashion and media, people are not just looking for utility but for aesthetics too. And with larger brands spreading across the country with scale production, aesthetic clothing is affordable to everyone

  • Aspirational buying

Women today are empowered with the ability of higher spending. Along with it, good clothing is aspirational too. A memorable occasion needs aspirational clothing to complete it. 

Where Do They Buy Sarees From

From sales of sarees by local vendors on instagram, facebook marketplace  and amazon and flipkart to larger chains and stores, Women today are shopping for sarees in every vertical available. The online market is one amongst the most important reasons in the growth of sarees in India. Since the adoption of Sarees is majorly in rural areas where penetration on internet is increasing day by day, this may result in opening of a brand new revenue pockets for stockholders in Indian sari business The increasing penetration of

While the online market and popup stores mostly takes care of the impulse buying and everyday needs of sarees, when it comes to shopping for occasions and events; women still prefer buying sarees in larger stores or from reputed brands. They dont mind the extra effort (and/or) the overhead cost that retail stores bear.

Internet, the increasing buying power of women, high brand consciousness and fashion sense has made e-commerce a crucial medium of shopping.

Customer saree shopping journey

From the customer’s shoes, Buying a saree is a very deeply embedded process with numerous points of friction and points of leverage. Customers interaction with the shopkeeper is only a part of a larger journey that they’re on.

The above mentioned is just an outlier of the shopping experience. The nuances and the conditions they evaluate change with every customer. 

Conclusion: (This Is) What Women Want.

  • Quality and assurance of the commodity plays a major factor on the buying
  • Emotionally, validation and feedback on what they wear plays a great role in the choice that women make while buying a product
  • Validation and feedback on a product are observed to be attained through conversation on the look and feel and the costing of a saree
  • Though buying a saree requires the evaluation of quality and feel, women prefer the design, work and other visual elements to look at a saree
  • Brand names play a major role
  • The idea behind fashionable clothing is to make someone look beautiful so the search is always for a saree that one looks beautiful in

Real Time Human Pose Estimation on the edge with Movidius NCS and OpenVINO

An approach towards low cost computing on the edge for vision based AI applications


Pose estimation is a computer vision approach to detect various important parts of a human body in an image or video. It gives pixel locations of where eyes, elbows, arms, legs, etc are for one or more human bodies in an image. The algorithm gives locations of “joints” of a body. However pose is a broader subject where-from we are only focusing on human body pose estimation. None of the algorithms are perfect and are heavily dependent on the training data.

How is it useful?

Human pose detection on the edge can be used to read body language and body movement in real-time at the same location as the person/s. This enables numerous applications in Security, Retail, Healthcare, Geriatric care, Fitness, Sports domains. Coupled with Augmented/Mixed Reality, we can transpose a human into a virtual world thus opening up newer opportunities and experiences in Fashion retail, Entertainment, Advertising and Gaming. Along with gesture recognition you can interact with the virtual world.

What is Myriad NCS?

If you have not heard of Intel’s Neural Compute Stick, it is a small device that plugs in via USB port and runs deep neural networks. Think of it as a USB graphics card that is optimised to run certain deep learning frameworks and models. Being a USB device, it can be run on an edge computing device such as a Raspberry Pi. It is low powered and comparatively small. These points make it a very good choice to run machine learning models on edge. If you are looking for something more embedded you can look at the VPUs from Intel.

OpenVINO provided OpenPose Model

OpenVINO provides a set of pre-trained models which can be run on Movidius NCS without having to go through the conversion process. One of the pre-trained models is human-pose-estimation. It is a multi-person model, based on MobileNet V1 and trained using caffe framework.

This model is a larger architecture based on OpenPose. The complexity is 15GFlops with 42.8% average precision on COCO dataset. The high complexity of the model is a bottleneck, rendering the option unusable on edge for real time detection. During our benchmarks, the model gave 2FPS on Movidius NCS 1. However, the accuracy was higher than PoseNet.

Tensorflow JS Posenet Model

Google has released a freely available, pre-trained model for pose estimation in browser, it is called PoseNet. You can refer to this blog post to know more about the model and its architecture.

In brief, the model is based on MobileNet V1 and is trained to detect single-person or multi-person poses. The model is optimised to run on Tensorflow JS which means it is light enough to run in a web browser.

Here is an overview of what we are going to do:

  1. Convert Tensorflow JS model to a normal Tensorflow model
  2. Install OpenVINO
  3. Convert Tensorflow model to OpenVINO supported format
  4. Run the model on Movidius NCS

Convert tfjs to Tensorflow

You can take one of the following 3 ways to get a .pb file:

  1. Download the files generated by us: click here to download
  2. Convert it yourself using tfjs-converter
  3. Use this repo, which downloads and converts the tfjs models for you

The simplest way is to download the ones we have given. That way you don’t have to install extra stuff on your computer and worry about the process of conversion.

As you will notice, there are 3 important files:

  1. model-mobilenet_v1_050.pb
  2. model-mobilenet_v1_075.pb
  3. model-mobilenet_v1_100.pb

These files refer to different version of MobileNet on which the pose estimator has been trained. To simplify, 050 is the fastest with low accuracy, 075 has more accuracy but is slower than 050. Lastly, 100 is the slowest but the most accurate among the three.

Which one should you choose? Keep reading, we are going to evaluate which model gives the best trade-off of accuracy and speed soon!

Install OpenVINO

To be able to run the model on Movidius NCS, we are going to use Intel’s distribution of OpenVINO toolkit. OpenVINO can be installed on Linux, Windows & Raspbian OS. You can follow the official instructions to install the toolkit. We have installed the toolkit on Ubuntu 16.04 to convert the model, and used Raspbian to run the model.

Step 1:

Install OpenVINO toolkit on your Linux machine. Keep in mind that you won’t be able to convert a tensorflow model to OpenVINO supported format on a Raspberry Pi, so this installation is a must (or install it on Windows).

Step 2:

Install OpenVINO toolkit on Raspbian. Raspbian installation of the toolkit only has inference engine. Which means you cannot convert your tensorflow (or caffe, MXNet) models to Intermediate Representation supported by OpenVINO, you will only be able to run inference on already converted models.

Next, we are going to:

  1. Convert tensorflow model to Intermediate Representation on a Linux machine
  2. Run inference on Raspberry Pi

Convert Tensorflow Model to OpenVINO Intermediate Representation

Intermediate Representation (IR) of a model is a file format recognised by OpenVINO toolkit, which is optimised to run on edge computing devices such as Movidius NCS.

Run the following command in your terminal:

python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo.py \
        --input_model ~/Downloads/posenet_tensorflow_models/model-mobilenet_v1_075.pb \
        --framework tf \
        -o ~/posenet/ \
        --input image \
        --input_shape [1,224,224,3] \
        --output "offset_2,displacement_fwd_2,displacement_bwd_2,heatmap" \
        --data_type FP16

This will give you two files: model-mobilenet_v1_075.mapping and model-mobilenet_v1_075.xml. These files are necessary to run inference on Movidius NCS.

You can replace — input_model with other versions of PoseNet (050 and 100) to get Intermediate Representations.

Transfer the two files on your Raspberry Pi and continue to the next step!

Running Inference on Raspberry Pi

Assuming you have installed OpenVINO toolkit on your Raspberry Pi and have transferred .mapping and .xml files, it is time to clone the repository .

The repository contains code to run benchmarks on Movidius. The code does not perform any image post processing to get proper benchmarks and to keep things simple. You can write OpenCV layer to render the key points on top of your input image.

Make sure your Movidius NCS is attached to the Raspberry Pi. Download an image of a person from the Internet and save it. Let’s call the downloaded image’s location $IMAGE_PATH. Next, move your model-mobilenet_v1_075.xml and model-mobilenet_v1_075.mapping files to the repository’s root.

Execute the following command in your terminal to run inference on Raspberry Pi:

python3 run_inference.py -m ./model-mobilenet_v1_075.xml -d MYRIAD -i $IMAGE_PATH


FPS comparison of different mobilenet models

The smallest model performs the fastest, with 42 frames per second! Check out the videos to understand how accurate each of them are:

Posenet50 at 30 FPS
Posenet75 at 30FPS
Posenet100 at 12FPS

We recommend you use 075 version, because 30 FPS is smooth enough for human eyes to consider it real time, and the accuracy is acceptable too for many use cases. However, you might want to consider another version depending upon your use case.


  1. Real-time Human Pose Estimation in the Browser with TensorFlow.js
  2. OpenVINO Documentation
  3. Download converted Tensorflow JS Models
  4. GitHub Repository to run inference on RPi
  5. posenet-python GitHub repository
  6. Tfjs-converter
  7. Tensorflow Pose Estimation
  8. Wikipedia — Pose
  9. OpenVINO pre-trained models

Pose Estimation Benchmarks on intelligent edge

Benchmarks on Google Coral, Movidius Neural Compute Stick, Raspberry Pi and others


In an earlier article, we covered running PoseNet on Movidius. We saw that we were able to achieve 30FPS with acceptable accuracy. In this article we are going to evaluate PoseNet on the following mix of hardware:

  1. Raspberry Pi 3B
  2. Movidius NCS + RPi 3B
  3. Ryzen 3
  4. GTX1030 + Ryzen 3
  5. Movidius NCS + Ryzen 3
  6. Google Coral + RPi 3B
  7. Google Coral + Ryzen 3
  8. GTX1080 + i7 7th Gen

This is a comparison of PoseNet’s performance across hardware, to help decide which hardware to use for a specific use case, if optimizations can help. It also gives a glimpse into hardware capabilities in the wild. The hardware included a range from baseline prototyping platforms to tailored for edge to production-grade CPUs.

Hardware Choices

  1. Raspberry Pi: The board of choice for prototyping, although low powered, gives a good initial understanding of what to expect and what to choose for production. It may not be able to run the DNN models, but it sure is fun.
  2. Movidius NCS + RPi 3B: Movidius Neural Compute Stick is a promising candidate if the model is to be run on the edge. NCS has Vision Processing Units (VPU) which are optimized to run deep neural networks.
  3. Ryzen 3: AMD’s quad-core CPUs are not a conventional choice for neural networks, but it is worth checking how the networks perform on the platform.
  4. GTX1030 + Ryzen 3: Adding an Nvidia GPU to the rig (granted, it is comparatively old but it is cheap) allows us to benchmark what is possible on older cuDNN versions and GPUs.
  5. Movidius NCS + Ryzen 3: A desktop system allows for better and faster interfacing with the NCS. This setup is preferred during prototyping your edge application. Having a high performance CPU allows rapid application development while NCS gives the ability to run your models on your development laptop.
  6. Google Coral + RPi 3B: Google’s answer to on-edge ML is their Coral board which has TPUs. Tensor Processing Units are used by Google’s gigantic AI systems. Coral puts the compute power of TPUs on small form factor. It has native support for Raspberry Pi too.
  7. Google Coral + Ryzen 3: As we mentioned in Movidius NCS + Ryzen 3 section, it is going to be insightful to see how Coral interfaces with Ryzen 3 based computer.
  8. GTX1080 + i7 7th Gen: Top of the line system with GTX1080 and Intel i7 CPU. This is the highest performing combination in the list.

Repositories and models used:

  1. PoseNet — tfjs version
  • Based on MobileNetV1_050
  • Based on MobileNetV1_075
  • Based on MobileNetV1_100

2. PoseNet — Google Coral version

3. Read our previous blog post to get Movidius versions of PoseNet

Comparing Edge Compute Units

Google Coral’s PoseNet repository provides a model based on MobileNet 0.75 which is optimized specifically for Coral. At the time of writing, the details of the optimizations have not been provided and it is not possible to generate models for MobileNet 0.50 and 1.00.

Google Coral vs Intel Movidius

The optimized Coral model gives an exceptional performance of 77FPS with Ryzen 3 system. However, the same model gives ~9FPS when running on Raspberry Pi.

Movidius shows differences in performance with RPi and Ryzen, with the general pattern being faster on the Ryzen 3 system

Comparing Desktop CPUs and GPUs

The results are aligning with expectations while comparing CPU with GTX 1030 and GTX 1080. The high-end GPU outperforms the other candidates by a huge margin. However, the competition between Ryzen 3 and GTX 1030 is close.

Ryzen vs GTX 1030 vs GTX 1080

Final Thoughts

The following chart shows frames per second for a standard video input:

Frames per second

Google Coral, when paired with a desktop computer outperforms every other platform — including GTX1080.

Other noteworthy results are:

  1. When paired with Raspberry Pi 3, Coral gives ~9FPS. The reason behind the result is not yet explained but is being looked into.
  2. GTX1080 performs almost equally regardless of the model size.
  3. Movidius NCS performs better than GTX1030.
  4. Raspberry Pi is not able to run the models at all.

Different hardware gives a different flavor of performance, and there is scope for model optimization (quantization for example). It may not always be necessary to go with a high-end GPU such as GTX 1080 if your use case allows for a good trade-off between accuracy and speed/latency.

Our analysis shows that choosing the right hardware coupling with a well-optimized neural network is essential and may require in-depth comparative analysis.

Why we chose Rust over Lua for our project?

Choosing a programming language(s) for a new product is an important strategic decision. It influences a lot of things and has long-term implications for hiring, culture and even the viability of a product.

But the first things to be considered is whether the language is viable for the particular problem statement you are trying to solve.

Important questions are:

  • How suitable is the language for your particular use case?
  • Will it perform up to the mark?
  • Will it run on the targeted platform(s)?

These should be the primary questions. But there are more things that might influence your decision. Like:

  • How choosing a particular language will influence your turnaround time from idea to reality?
  • What are the cost benefits of using a particular language?
  • How easy will be it to solve new problems that you might stumble along the way?

Keeping these questions in mind, this article will try to explain our reasoning behind choosing Rust for our new product.

Use Case

Our problem statement was to use an edge device that can process data from different sources in real-time and create a knowledge graph, therefore the language we choose must be fast to allow minimum real-time latency and use limited resources of an SoC device.


Languages performance comparison fig.1

Comparing the cross-language performance of real applications is tricky. We usually don’t have the same expertise in multiple languages and performance is much more influenced by the algorithms and data structures the programmer choose to use. But as the benchmarks above show, it is generally believed that Rust performs on par with C++. And performs much better than other interpreter or JIT based languages such as Lua or Python.


As described in the use case above, we wanted to process data in real-time from multiple sensors. Our target platform, SoC devices, use ARM-based CPUs and generally have 4+ cores. We wanted to utilize all CPU cores, that means having multithreading support was important.

Lua does not have native multithreading support, there are 3rd party workarounds but the performance and reliability of those are questionable. Rust, on the other hand, has built-in support for multi-threading and Its ownership, borrowing rules help us write very safe concurrent code.

Memory Safety

Dynamically typed languages give you a lot of flexibility. Type changes do not need manual propagation through your program. It also gives more mental flexibility, as you can think more in terms of transformations, operations, and algorithms. Flexibility lets you move faster, change things quickly, iterate at a faster velocity. But it comes with a cost. It’s a lot easier to miss potential problems and these problems are generally very hard to debug. Plus these features generally comes with a performance penalty.

On another hand in a statically typed language, a large number of errors are caught in the early stage of the development process, and static typing usually results in compiled code that executes more quickly because when the compiler knows the exact data types that are in use, it can produce optimized machine code. Static types also serve as documentation.

Rust goes above and beyond these points. Rust’s very strict and pedantic compiler checks each and every variable you use and every memory address you reference. It avoids possible data race conditions and informs about undefined behavior.

The right part of the chart above shows concurrency and memory safety issues. These are the most complex and unpredictable classes of errors and are fundamentally impossible to get in the safe subset of Rust. Moreover, all these type related bugs are dangerous and result in a variety of security vulnerabilities.

Type safety is one of the Rust’s biggest selling point and is the reason Rust topped as most loved language for 3 consecutive years in StackOverflow Surveys.

The way Rust achieved this feat is by using the concept of ownership of a variable. In Rust, every value has an “owning scope,” and passing or returning a value means transferring ownership to a new scope. You lend out the access to the functions you call, that’s called “borrowing”. Rust ensures that these leases do not outlive the object being borrowed. This not only makes it very type safe but also helps you tackle concurrency head-on because memory safety and concurrency bugs often come down to code accessing data when it shouldn’t.

Developer Experience

Rust has a steep learning curve. Most of it is due to the “ownership” & “borrowing” concepts we discussed above. Which makes Rust difficult and more time consuming than garbage collected languages like Lua or Python. It requires one to be very aware of basic computing principles regarding memory allocation and concurrency, and It requires you to keep these principles in mind while implementing, this should be the case for any language, but in Rust particularly, you are explicitly forced by compiler to write optimum memory-safe code.

Yet Rust has a lot of features and conveniences that almost make it feel like a high-level language despite the fact that you’re doing things like manual memory management that you do in C++. And Rust has a lot of abstractions that make it not feel like manual memory management anymore.

Low-level control and high-level safety promises developers far more control over performance without having to take on the burden of learning C/C++, or assume the risks of getting it wrong.


When you want to implement a high-performance concurrent system with low resource footprint, the choice of programming languages is limited. Interpreter based languages tend to perform poorly in high concurrency & low resource environments. System programming languages are the idle candidate for such use cases.

C/C++ is the holy grail of systems programming languages. But there’s a reason C & C++ are also one to most dreaded languages in StackOverflow Surveys. For newer programmer coming out of other higher level languages, approaching C/C++ is hard. The learning curve is very steep. There are approximately 74 different build systems and a patchwork framework of 28 different package managers, most of which only support a certain platform/environment and are useless outside of it. After 30+ years of evolution, new programmers have too much thrown at them.

Rust on other have is comparatively easier to approach, has a sizable community, does not come with decades of technical dept, yet provides comparative performance. Memory safety & easier concurrency are just added benefits.

Are you looking for a reliable technology partner for your ideas ? Talk to us

Car or Not a Car

Lessons from Fine Tuning a Convolutional Binary Classifier

Fine tuning has been shown to be very effective in certain types of neural net based tasks such as image classification. Depending upon the dataset used to train the original model, the fine-tuned model can achieve a higher degree of accuracy with comparatively less data. Therefore, we have chosen to fine tune ResNet50 pre-trained on the ImageNet dataset provided by Google.

We are going to explore ways to train a neural network to detect cars, and optimise the model to achieve high accuracy. In technical terms, we are going to train a binary classifier which performs well under real-world conditions.

Taken in a village Near Jaipur (Rajasthan, India) by Sanjay Kattimani http://sanjay-explores.blogspot.com

There are two possible approaches to train such a network:

  • Train from scratch
  • Fine-tune an existing network

To train from scratch, we need a lot of data — millions of positive and negative examples. The process doesn’t end at data acquisition. One has to spend a lot of time cleaning the data and making sure it contains enough examples of real world situations that the model is going to encounter practically. The feasibility of the task is directly determined by the background knowledge and time required to implement that.

Basic Setup

There are certain requisites that are going to be used throughout the exploration:

  1. Datasets
    a.Standford Cars for car images
    b. Caltech256 for non-car images
  2. Base Network
    ResNet — arXiv — fine-tuned on ImageNet
  3. Framework and APIs
    a. TensorFlow
    b. TF Keras API
  4. Hardware 
    a. Intel i7 6th gen
    b. Nvidia GTX1080 with 8GB VRAM
    c. System RAM 16GB DDR4

Experiment 1

To start with a simple approach, we take ResNet50 without the top layer and add a fully connected (dense) layer on top of it. The dense layer contains 32 neurons which are activated with sigmoid activator. This gives approximately 65,000 trainable parameters which are plenty for the task at hand.

Model Architecture for experiment 1

We then add the final output layer having a single neuron with sigmoid activation. This layer has a single neuron because we are performing binary classification. The neuron will output real values ranging from 0 to 1.

Data Preparation

We are randomly sampling 50% of images as the training dataset, 30% as validation and 20% as test sets. Although there is a huge gap between the number of car and non-car images in the training set, it should not skew our process too much because the datasets are comparatively clean and reliable.



As a trial run, we trained for one epoch. The graphs below illustrate that the model starts at high accuracy, and reaches near-perfect performance within the first epoch. The loss goes down as well.

Epoch Accuracy for Experiment 1
Epoch Loss for Experiment 1

However, validation accuracy does not seem very good compared to the training round, and neither does validation loss.

Validation Accuracy for Experiment 1
Validation Loss for Experiment 1

So, we ran for 4 epochs and were left with the following results:

Accuracy and Loss for four epochs
Validation accuracy and validation loss for four epochs

The model performs relatively well, except for the high degree of separation between training and validation losses.

Experiment 2

We decided to keep the model architecture the same as the one we used in the first experiment, using the same ResNet50 without the top layer and adding a fully connected (dense) layer on top of it containing 32 neurons activated with sigmoid activator.

Model Architecture for experiment 2

Data Preparation

This is where the problem lay in the previous experiment. The train/validation/test data splits were random. The hypothesis was that the randomness has added more images of some cars, and too little of others, causing the model to be biased.

So, we took the splits as given by the Cars dataset and added 3000 more images by scraping the good old Web.



These results signify a substantial improvement in the validation accuracy when compared to the previous experiment.

Epoch Accuracy for experiment 2
Epoch Loss for experiment 2

Even though the accuracy matches fairly well, there is a big difference between the training loss and the validation loss.

Validation Accuracy for experiment 2
Validation Loss for experiment 2

This network seems more stable than the previous one. The only observable difference is that of new data splits.

Experiment 3

Here we add an extra dropout layer which provides a 30% chance that a neuron will be dropped out of the training pass. The dropout layer has been known to normalize models, to prevent possible biases caused by interdependence of neurons.

Model Architecture for experiment 3

Since we have a comparatively huge pre-trained network and smaller trainable network, we could add more dense layers to see the effects. We did that and the model ended up achieving saturation in fewer epochs. No other improvements were observed.

Data Preparation

Just like in experiment 2, the default train/validation splits are taken.


Here, we have run the model on a single learning rate but the value can be experimented with. We will talk about the effects of batch size on this network in the results section.


The results here are with the batch size of 32. As seen, in 3 epochs the network seems to saturate (although it might be a bit premature to judge this).

Epoch accuracy for experiment 3
Epoch Loss for experiment 3

At the same time validation accuracy and loss also seem to be performing well.

Validation Accuracy for experiment 3
Validation Loss for experiment 3

So, we increase the batch size to 128 hoping it would help the network find a better local minima and thereby giving a better overall performance. Here is what happened:

Epoch Accuracy and Loss for batch size of 128
Validation Accuracy and Loss for batch size of 128

The model now performs reasonably well on both training and validation sets. The losses between training and validation runs are not too far apart either.

Model Drawbacks

Obviously, the model is not one hundred percent accurate. It does provide certain failed classifications as a result.


When we ran this model on the testing dataset, it failed on only 7 images out of car + non-car sets. This is a very high degree of performance accuracy and closer to production usage.

In conclusion, we can safely assert that dataset splits are crucial. Rigorous evaluations and experimentation with various hyper-parameters give us a better idea of the network. We should also think about modifying the original architecture based on the evidence provided by the various hyper-parameters.

Are you looking for a reliable technology partner for your ideas ? Talk to us