Friday, December 27, 2019

Data generated by its citizens as a national asset...

The following statement in a Harvard Business Review (HBR) article, which was published on December 18, 2019, caught my eye,
"India is a nation state, it would treat the data generated by its citizens as a national asset, store and guard it within national boundaries, and reserve the right to use that data to safeguard its defense and strategic interests."[1]  The HBR article is about India's proposed legislation to protect its consumer data. The bill is called the Personal Data Protection Bill (DBP).

On December 24, 2019, The New York Times published an article called Pentagon Warns Military Personnel Against At-Home DNA Tests.[2] The article discusses a Department of Defense internal memo which discourages military personnel from taking mail-in DNA tests.  According to DoD leadership, they:
  • Are unreliable;
  • Negatively impact service members' careers; and 
  • Create security risks.
I am not sure if there is any relationship between the two articles, but it highlights the dangers of international privacy laws, which are still evolving.  I believe our DNA information is our most private and personal information.  If countries like India push laws like the DBP,  which can use its citizens' personal information, including DNA for national and strategic interests, then folks to be careful about their personal data.  Nations cannot use their citizens' data if the data isn't generated.

The next question to ask if these laws will get less complicated.   The answer lies in the history of privacy laws.

According to the HBR article's authors, the proposed DBP is based on the European Union's (EU) General Data Protection Regulation (GDPR).  According to an EU website, the GDPR was released on May 24, 2016, and the EU enforced the GDPR on May 25, 2018.[3] The EU used the US Department of Commerce's National Institution of Standards and Technology (NIST) recommendation called Guide to Protecting the Confidentiality of Personally Identifiable Information (PII) (Special Publication 800-122)[4], which was released in April 2010.  The state of California issued the California Consumer Privacy Act (CCPA) in June 2018.  The CCPA is based on GDPR with modifications.[5]

With each data regulation, bill, and recommendation released over time, nations and states are trying to protect its citizens and their data, but countries (like India, China, and others) may use the data to further political and national security reasons.

As members of the digital age, we cannot cut off ourselves from the internet, but we need to be aware of the risks, policies, and rights since they are constantly evolving like the technologies around us.  Military and other governments may ask their citizens to share their DNA for the good of their countries. Still, I am not a fan of businesses monetizing, or criminal syndicates stealing my data. I believe privacy laws, in general, are good, but we in the US need a cohesive single data protection legislation and a watchdog organization that focuses on digital privacy.  I am not a policy wonk, but I believe national and international privacy laws need to mature as data grows exponentially, and technologies like artificial intelligence proliferate our lives in the next ten years.

Until then, please be careful what you share on the web, including your DNA results.  Currently, our DNA information in the US is protected via the Genetic Information Nondiscrimination Act of 2008 (GINA) [6]. That being said, presidents change, legislators change, supreme government justices change, governments may change, but your DNA information doesn't change.

[1] Govindarajan, V., Srivastava, A., & Enache, L. (2019, Dec., 18). How India Plans to Protect Consumer Data. Retrieved from https://hbr.org/2019/12/how-india-plans-to-protect-consumer-data.
[2] Murphy, H., & Zavehri, M. (2019, Dec. 24). Pentagon Warns Military Personnel Against At-Home DNA Tests. The New York Times. Retrieved from https://www.nytimes.com/2019/12/24/us/military-dna-tests.html.
[3] Data protection in the EU | European Commission.  Retrieved from https://ec.europa.eu/info/law/law-topic/data-protection/data-protection-eu_en.
[4] McCallister, E., Grance, T., & Scarfone, K. (2010, Apr.). Guide to Protecting the Confidentiality of Personally Identifiable Information (PII). NIST Special Publication 800-122. Retrieved from https://csrc.nist.gov/publications/detail/sp/800-122/final.
[5] Korolov, M. (2019, Oct. 4). California Consumer Privacy Act (CCPA): What you need to know to be compliant. Retrieved from https://www.csoonline.com/article/3292578/california-consumer-privacy-act-what-you-need-to-know-to-be-compliant.html.
[6] The Genetic Information Nondiscrimination Act of 2008 (2008, May 21). U.S. Equal Employment Opportunity Commission (EEOC). Retrieved from https://www.eeoc.gov/laws/statutes/gina.cfm.
[7] Picture for this blog post retrieved from https://pixabay.com/illustrations/dna-matrix-genetics-control-3888228/.

Friday, December 20, 2019

"Dear Santa" letter from a vigilant kid

Dear Santa,

I cannot send you my Christmas letter because the letter will have a lot of my personal data aka personally identifiable information (PII) data. Due to privacy laws and potential stalkers, my parents forbade me to send you a letter or an email. With the various cybersecurity issues, I am not sure if my email will end up in the wrong hands and my daddy, mommy, brother, sister and I may get socially engineered and phished.

Since I still believe in you, here is a compromise. I will share my Amazon wish list and you can use it to send me toys. It needs to be magically because my parents forbade me to share my address with you. I noticed that now Amazon recommends items that I can add to my Amazon wish list. I wonder if Amazon is learning my behavior on Amazon.com. Doesn't that compromise my privacy? Mommy and Daddy tell me privacy laws are here to protect me but I am not sure if websites like Amazon.com, Google.com, Yahoo.com should use my data to make money.

Nevertheless, please send me your email address and I will share my Amazon wish list.

Merry Christmas Santa,

Your Friend

P.S.  My parents forbade me to share my name with you.

NOTE: I used the picture from https://spaceshipsandlaserbeams.com/20-free-printable-letters-to-santa/.

Sunday, December 15, 2019

Hack the LIME - the story about open-source AI

Leave it to a lawyer to spoil the fun!  Andrew Burt, a lawyer, and writer, recently wrote an article for the Harvard Business Review, which discusses the risks of open-source Artifical Intelligence (AI) models   The HBR post, "The AI Transparency Paradox," explains the dangers of making artificial intelligence models transparent.  On the one hand, businesses want to know how companies engineered their AI algorithms.  Is the AI algorithm too sexist, racist, and just not good?  For instance, in James Vincent's article, Google ‘fixed’ its racist algorithm by removing gorillas from its image-labeling tech, Google "fixed" its 2015 deep learning algorithm, which labeled humans as monkeys. I am sure the appropriate stakeholders at Google wanted to know how the algorithm was written and how can the developer fixed the inappropriate bias in the algorithm.

On the one hand, if Google opened its AI code to the open-source community, then the AI algorithms could be significantly improved.   On the other hand, as (our no-fun lawyer friend), Andrew Burt cautions businesses that they would be vulnerable if they opened their AI algorithms to the world, which includes bad actors like hostile governments.  Andrew uses the research paper titled, “Why Should I Trust You?” Explaining the Predictions of Any Classifier.  The paper is about the Local Interpretable Model-Agnostic Explanations (LIME) algorithm.  Andrew then talks about a research paper. How can we fool LIME and SHAP? Adversarial Attacks on Post hoc Explanation Methods,  which was published in November 2019.  The document discusses a "novel scaffolding technique," which can be used to hack the LIME algorithm.  Is there a return on investment if companies made their algorithms more transparent?

Folks may argue that Google successfully made its TensorFlow machine learning (ML) platform open source without much impact.  Before the TensorFlow platform question can be answered,  the following terms need to be defined:
  • Machine Learning;
  • Deep Learning; and 
  • Artificial Intelligence.
According to Geeks for Geeks portal, "Machine Learning is a technique of parsing data, learn from that data, and then apply what they have learned to make an informed decision." Machine Learning includes supervised training where the machine builds the model on most of the data (70% to 90% of the complete dataset) and then test the data for accuracy with the remaining portion of the data.  Unsupervised training involves machine learning from the full dataset.  There is no testing of the data.

Deep Learning algorithms are a specific subset of Machine Learning algorithms, which are primarily composed of neural networks. A common use of deep learning algorithms is for image recognization.

Gartner defines Artificial intelligence (AI) is the application of "...advanced analysis and logic-based techniques, including machine learning, to interpret events, support and automate decisions, and take actions."

Gartner recommends several open-source Data Science and ML platforms, including TensorFlow, to develop ML-based solutions.    Does this mean Mr. Burt is incorrect?

Andrew Burt is correct because comprised algorithms, which are the core of all AI solutions, will cause businesses to make poor decisions and ultimately discredit the AI solution. The underlying technology, which enables to invoke the algorithm, pull the data, process the data, and reports the outcomes, can be open source.  The other critical piece of any AI solution is the data itself.   Sensitive data is never unless asked by law enforcement or regulatory agency.

In conclusion,  Andrew Burt is correct by bringing up valid points, but at the same time, the AI field is still pretty young.  I am a strong believer in open-source software, and I believe we will have open-source AI models available.  We will also have AI models that will validate other models using technologies like blockchain technologies.

NOTE: The video below is about the LIME algorithm, and please keep in mind researchers recently hacked this algorithm.

Saturday, December 7, 2019

Check out Manna and "Empathy driven AI"

My summer reading included a book called Manna: Two Visions of Humanity's Future by Marshall Brain,  which described a world with robots, and artificial intelligence (AI). In the book, there were two specific implementations of this technology.  One focused on automating businesses, and the other focused on improving the quality of life. 
I am not sure if we will reach either state, but it was good to get a perspective on how the world could be.  Personally, as a trained data scientist,  I don't envision robots taking over my life.  With any technology,  we need to use it with pragmatism.  AI will enable decision-makers to be aware of their biases and tendencies.  "Based on gut feelings" will be reduced.  At the same time,  all decision-makers shouldn't be making identical decisions since they may not have similar emotions about a decision.  

Speaking about emotions, currently, AI engineers are currently developing  AI algorithms to mimic and sense human emotions  Chatbots, and virtual assistants like Alexa, Google Assistant, and others want to detect emotion via user interaction and provide a better answer.  AI-driven emotions should strive for better sales and improved customer experience. A Senior Analyst (now former) Allison Snow at Forrester posted a blog entry on November 12, 2018, titled AI Tech Shines A Path To Empathetic Triggers where connects three independent entities empathy, insights, and AI within the context of B2B marketing. On October 14, 2019,  Gartner published the article, Digital Empathy: A New Lever for Earnings Growth by Kristin Moyer, Dave Aron, and Don Scheibenreif.  Gartner typically takes a concept and brands it with its own buzz words.  In this case, Gartner calls AI-driven emotions "Digital Empathy."  

With all innovative ideas, you will find folks who will talk about the dangers of these ideas.  In the article, Empathy in Artificial Intelligence on Medium.com, Jun Wu writes about AI, human emotions, and the ethics around.  Last month Harvard Business Review (HBR) published an article called The Risks of Using AI to Interpret Human Emotions by three Accenture employees (Mark Purdy, John Zealley, and Omaro Museli).  Accenture talks about biases seeping in the AI since AI is nothing more than decisions based on past data.  

Overall the AI-driven empathy concept is entirely new. I wonder how the Nigerian prince scam would work if we were effectively duped into giving money. I feel bad for the security guys, but at the same time, there is a business opportunity. All of these ideas are reflected in the book Manna: Two Visions of Humanity's Future.  Check out the book!

NOTE: I used the image from Amazon.com.