Welcome to the Center for Machine Learning and Big Data Analytics


  • Herding Behavior and Event Detection in Social Networks

    Social synchrony is a phenomena studied in social science. We study this phenomenon of social media networks and define it in a formal way. We define a method to detect this phenomena in the Twitter network and define additional methods to use this phenomena to detect if that specific engagement denotes an event or not. One of the measures that we define here is herding behavior. This study leads us to look into the unique behavioral pattern of users especially in the Twitter network.

  • Modeling of information diffusion

    We look at the effect of exogenous sources in the spread of infections during pandemics. By exogenous we mean outside the population of interest. Infections spread during pandemics can be of two kinds: endogenous and exogenous. Endogenous is the infection spread within the population of interest and Exogenous is the one that is spread from a source outside the population. First we model this by defining a new compartmental model called as Exo-SIR (extension of SIR model) and we look at the Covid and Ebola dataset containing the spread of infection in India and Guinea, respectively, and see what is the interplay between the endo- and exogenous infections.

  • Bias and Privacy in the Web

    Web is pervasive these days. User data is present all over the web in various formats. These days the data in various formats can be collated from the public domain in an automated way, integrated and augmented to get to know more about the users personal information. Hence it has become imminent that people who are using the web and people who are managing the web (in social media both are almost the same!) to take care of the privacy and the security part of the user data in the web. In addition to this human's inherent biases are present all over the internet in the data and these days with ML algorithms applied on it, even the ML algorithms get learned to get biased in a huge way. This affects people who use the ML algorithms to make decisions. Moreover in the name of personalisation people get data related to their preference and there comes the effect of filter bubble. So our work on these two problems attempt to study the bias present in Google Ad personalisation, Google search, bias in media and various privacy issues present in Indian government websites.

    With no exclusive privacy laws enacted in India it will be interesting to find the amount of tracking that happens in Indian websites. We look at the tracking ecosystem in Indian News Media websites and show that the partisan websites follow different ways of tracking its users. We also look at the topical pages in the Indian News Media Websites and show that the amount of tracking is much more in topical pages than the front pages (homepages). We also show that a considerable number of agencies follow preferential attachment when putting cookies in the topical webpages.

  • Action Recognition in Cricket Videos

    Videos are unstructured data. So, searching in videos is a highly non-trivial task. In our work we take the cricket videos and attempt to do the labeling as what kind of shot was played by the batsman. To do this using ML techniques we need a labelled dataset which is a challenging task. But once it is done, it becomes a benchmark for addressing various other problems. For example, this dataset becomes our training data for our work on classifying the cricket shots. Once we are able to classify the cricket shots we will be able to automatically label the cricket videos with respect to various cricket shots played. This makes the cricket videos searchable and segmentable with respect to various cricket shots played.

  • Music Information Retrieval

    In our work we touch upon a novel method called thumbnailing and explore how this can be used in Carnatic Music and help classify some of the Rakti Ragas that might have a lot of unique but repetitive phrases. This work is in collaboration with a team of researchers from IIIT Hyderabad. As another problem, we also look at specifically the Ragam Tanam Pallavi (RTP) of a concert and look at modelling a classifying algorithm using signal processing and machine learning techniques that can automatically deduct the various segments of the RTPs like Raga alapana section, Pallavi section, Swara Prastara Section and Tani Avartanam section.

  • Impact and the Dynamics of Exogenous Sources of Infection in the Spread of Covid19
  • Tracking Ecosystem in the Indian News Media Websites
  • Digital Surveillance and Understanding Its Chilling Effect on Journalists