Sunday, December 15, 2019

Hack the LIME - the story about open-source AI

Leave it to a lawyer to spoil the fun!  Andrew Burt, a lawyer, and writer, recently wrote an article for the Harvard Business Review, which discusses the risks of open-source Artifical Intelligence (AI) models   The HBR post, "The AI Transparency Paradox," explains the dangers of making artificial intelligence models transparent.  On the one hand, businesses want to know how companies engineered their AI algorithms.  Is the AI algorithm too sexist, racist, and just not good?  For instance, in James Vincent's article, Google ‘fixed’ its racist algorithm by removing gorillas from its image-labeling tech, Google "fixed" its 2015 deep learning algorithm, which labeled humans as monkeys. I am sure the appropriate stakeholders at Google wanted to know how the algorithm was written and how can the developer fixed the inappropriate bias in the algorithm.

On the one hand, if Google opened its AI code to the open-source community, then the AI algorithms could be significantly improved.   On the other hand, as (our no-fun lawyer friend), Andrew Burt cautions businesses that they would be vulnerable if they opened their AI algorithms to the world, which includes bad actors like hostile governments.  Andrew uses the research paper titled, “Why Should I Trust You?” Explaining the Predictions of Any Classifier.  The paper is about the Local Interpretable Model-Agnostic Explanations (LIME) algorithm.  Andrew then talks about a research paper. How can we fool LIME and SHAP? Adversarial Attacks on Post hoc Explanation Methods,  which was published in November 2019.  The document discusses a "novel scaffolding technique," which can be used to hack the LIME algorithm.  Is there a return on investment if companies made their algorithms more transparent?

Folks may argue that Google successfully made its TensorFlow machine learning (ML) platform open source without much impact.  Before the TensorFlow platform question can be answered,  the following terms need to be defined:
  • Machine Learning;
  • Deep Learning; and 
  • Artificial Intelligence.
According to Geeks for Geeks portal, "Machine Learning is a technique of parsing data, learn from that data, and then apply what they have learned to make an informed decision." Machine Learning includes supervised training where the machine builds the model on most of the data (70% to 90% of the complete dataset) and then test the data for accuracy with the remaining portion of the data.  Unsupervised training involves machine learning from the full dataset.  There is no testing of the data.

Deep Learning algorithms are a specific subset of Machine Learning algorithms, which are primarily composed of neural networks. A common use of deep learning algorithms is for image recognization.

Gartner defines Artificial intelligence (AI) is the application of "...advanced analysis and logic-based techniques, including machine learning, to interpret events, support and automate decisions, and take actions."

Gartner recommends several open-source Data Science and ML platforms, including TensorFlow, to develop ML-based solutions.    Does this mean Mr. Burt is incorrect?

Andrew Burt is correct because comprised algorithms, which are the core of all AI solutions, will cause businesses to make poor decisions and ultimately discredit the AI solution. The underlying technology, which enables to invoke the algorithm, pull the data, process the data, and reports the outcomes, can be open source.  The other critical piece of any AI solution is the data itself.   Sensitive data is never unless asked by law enforcement or regulatory agency.

In conclusion,  Andrew Burt is correct by bringing up valid points, but at the same time, the AI field is still pretty young.  I am a strong believer in open-source software, and I believe we will have open-source AI models available.  We will also have AI models that will validate other models using technologies like blockchain technologies.

NOTE: The video below is about the LIME algorithm, and please keep in mind researchers recently hacked this algorithm.

No comments: