Machine Learning on Apache Spark at Scale
Share this Session:
  Debajyoti Ray   Debajyoti Ray
Chief Data Officer
VideoAmp Inc.


Tuesday, January 31, 2017
11:30 AM - 12:15 PM

Level:  Technical - Intermediate

Consumers now view the same content seamlessly across multiple devices. This shift in consumer behavior has come to a head with the way advertising is sold. Each medium is sold separately in TV and online silos, creating an opportunity to bridge the gap and make advertising more effective using data and machine learning.

We'll discuss the developments made at VideoAmp to bring together data from disparate mediums to build a large-scale consumer graph using Apache Spark for 150 Million users across 2 billion nodes. Machine Learning and graph analytics methods are then used to build audience models for cross-screen bid optimization, frequency capping, and sequential targeting.

This talk will cover:

  • A brief overview of Machine Learning, in particular graph analytics, on Apache Spark
  • The architectural choices to run both law-latency streaming analytics and large batch jobs on the same platform
  • Our open-source project, Flint, to spin up on-demand Spark clusters for ad-hoc analytics and large scale batch processing

Deb Ray is the Chief Data Officer at VideoAmp, where he is focused on developing the data platform and the data science that enables advertisers and media owners to transact seamlessly across devices with a Screen Optimization Platform.

Deb completed his PhD in Machine Learning and Economics at Caltech. At Caltech, he founded Pasadena Labs, an adtech company that built a hyperlocal ad platform used by clients such as Microsoft and IAC. Deb has also developed computer vision products at Microsoft Research, and machine learning algorithms for high-frequency trading. He has been awarded several patents, his publications have been highly cited, and he is a frequent speaker at AI and Big Data conferences.

Close Window