Saturday, October 19, 2024

Unicorns have an exciting future


I recently began taking courses to obtain a Digital Transformation certificate from Stanford University's School of Engineering.  I highly recommend it since it flexes your critical thinking and strategic acumen; a few weeks ago, I completed the Building an AI-Enabled Organization course, which showcases how data science and artificial intelligence (AI) can be applied within the context of the business and corporate world.   I thought I completed my training when I obtained a master's in data analytics and business administration.    

A couple of items that stuck out to me from the course:
  1. Data Drifts & Model Drifts -  Data drifts are when the statistical distribution of the data changes over time. For instance, electric cars didn't exist before the 2000s.  The percentage of gasoline-powered cars reduced after the 2000s.  Model drifts, aka concept drifts, are learning models that are outdated since they were developed using the pre-data drift model.
  2. Unicorns—Steven Geringer's Data Science Venn Diagram v2.0 showcases the rare breed of knowledge worker called unicorns.
Steven Geringer's Data Science Venn Diagram v2.0 (Mokrian, n.d.)


These unicorns are rare since these data scientists have insights into the subject matter expertise, computer science and a background in math and science.  These unicorns reminded me of professionals who work with bioinformatics, nursing informatics, aviation informatics, and other informatics.  According to Wood (2023), the subject of informatics is to design, develop and maintain systems that capture, store, and manage domain-specific data.  Wood (2023) also states that data science is to get insights into the data and develop predictive analytics to enable key staff to make decisions.  Even though informatics and data sciences are distinct disciplines, they depend on each other to help any key organization or program.  For instance,  data scientists can only gain insights if the data quality is good.  Data quality is an outcome of how systems capture and store the data.  If organizations want their data scientists to leverage multiple data sources to gain insights, then data management is key.  In other words, data scientists are dependent on informatics specialists since their work is important.

Suppose you or your loved ones want to be the next Unicorn or be part of the Unicorn team. In that case, I recommend that you pursue an informatics degree in a specific domain and take a handful of data analytics and data science courses.  The informatics degree will train you in a specific subject matter and expose you to computer programming.  Here is a list of the colleges and universities and the types of informatics degrees they offer:
  • Applied Informatics - University of Technology, Russia;
  • Business Informatics - Modul University, Austria;
  • Cybersecurity Informatics - San Jose State University;
  • Health Informatics - George Mason University;
  • Informatics for Logistics - College of Logistics, Czech Republic;
  • Integrative Informatics - Allegheny College;
  • Maritime Logistics Informatics - University of Tasmania, Australia;
  • Social Informatics - HSE University, Russia;
  • Urban Informatics - Northeastern University; and
  • Others.
For domains, like sports, that already use data science,  colleges offer specialized degrees like:
  • Crime Analytics - Toronto Metropolitan University, Canada;
  • Nutrition and Health Analytics - Munster Technological University, Ireland;
  • Sports Analytics - American University; and 
  • Others
I predict colleges and universities will offer more domain-specific analytics degrees in a matter of years.  With more domain-specific data being generated by various industries, data scientists will tighten up the algorithms to be specific to each domain for better insights and more focused decision-makers.  I don't think there will be a slowdown in folks getting informatics degrees since industries still need human smarts and brains to design how data should be captured, stored, and processed.    With the demand for platform technologies like Salesforce, Appian, and others, as well as platform businesses like  Fiverr, Etsy, LinkedIn, and others,  folks with data science and computer science degrees will have jobs.  Suppose you have these skills and expertise in niche domains like maritime, aviation, urban, nutrition, and others. In that case, you have a very bright future that can be translated into more money for the next several years.  I also envision platforms that support specific domains will be created and may thrive.  I say "may thrive" because the folks who run these platform businesses need savvy executives who have a feel for the markets and adopt agile business strategies.  The future is exciting in this space for unicorns, entrepreneurs,  and disrupters.  Generative AI technologies will complement this space, but I don't see them ever replacing the unicorns since Generative AI is nothing more than a chatbot that overlays the data science engines and the unicorns who work with them.

In summary, if you or your loved ones want to make a living as unicorns, there are avenues to pursue, and I see an exciting future for the next five to ten years or even longer.

As always, please comment if you agree or disagree with my entry and subscribe to this blog.

References
Morkian, P. (n.d.).  Building an AI-Enabled Organization.  Stanford Online University.

Wood, T. (2023, October 10).  Healthcare Data Science vs. Healthcare Informatics (and why the difference matters).  Fast Data Science.  https://bit.ly/4hbH0u3

Saturday, September 7, 2024

digitalEnoch: voice -generated-AI


"Radio station that produces 'Radiolab'"
—That was the clue on the Wednesday, July 24th, 2024, New York Times (NYT) crossword puzzle.  The answer was WNYC, which in my mind stood for W—New York City (NYC).  Intrigued, I went to my podcast app and searched for RadioLab and WNYC, and discovered the Radiolab podcast.  

If you are a science/technology geek and listen to podcasts like me, please check out the Radiolab weekly podcast.  WNYC drops new episodes on Fridays, which have expanded my sphere of knowledge and learning.  For instance, did you know dolphins sleep with half their brain while they use the other half to swim and breathe, or the killer instincts of Argentine ants?  Sometimes, I wish the Argentine men's soccer team had killer instincts against their opponents. 

As someone who works with software, cybersecurity, and data, including artificial intelligence (AI), I find yesterday's RadioLab podcast episode called Shell Game simply fascinating.  It is a good example of how AI voice leveraging generative AI like ChatGPT, Google Gemini, and others are getting sophisticated.  The RadioLab host Latif Nasser interviews Evan Ratliff, a tech journalist who writes about the seedier side of technology.  Evan and his team created a voice AI that sounds just like Evan, and during the episode, Latif plays recorded sessions of Evan's interaction with regular people on the phone.  For instance, in the podcast, I heard the digital Evan interfaces with an online therapist, a $1200/hr attorney, entrepreneurs, etc.  

It made me wonder how a digital version of myself—let’s call it digitalEnoch—would fare.  How would an AI model learn from me and mimic my behavior?  Would digitalEnoch be more analytical or more empathetic? For instance, would digitalEnoch use the word "perfection?"  You will need to listen to the RadioLab podcast to find out about the word "perfection."

In summary, check out this podcast if you want to see how generative voice AI is evolving.  Put on your cybersecurity hat; it's a fascinating future.  I wouldn't say scary because policies, processes and technology are also evolving to put "bad guys" in check.   Here are a couple of links:

As always, please feel free to comment below and follow this blog.

Saturday, August 17, 2024

Give me the list of the top 50 quarterbacks

It's that time of the year again.  I am doing my online research and scouring the internet for tidbits to get ready for this fall and winter's football season.  As a University of Kansas (KU) alum,  I follow my Jayhawks and may go to a game this year.  No, I won't be traveling to Lawrence, Kansas, but I may catch the game in Morgantown, WV.  Thanks to the crazy geographical dispersion of universities in the various significant colleges' sports conferences.  Can we use machine learning and language learning models (LLM) to determine the ideal sports conferences?  We can leverage factors like fan base, endorsements, level of competition, revenue, travel costs, and others to figure out which colleges should reside in which college conferences like the Big Twelve (BIG XII), Big Ten (BIG 10), and others.  

Since this is a technology-centric blog entry, let's get back to the topic at hand.  I get excited about fantasy football every fall, and the pinnacle of any fantasy sports league is the player draft.  I have also been fascinated by how good these LLMs are at meeting my fantasy sports needs.  As someone who has played fantasy sports since 1996,  I have trusted sources.  For this blog entry,  I am going to use the rankings from Yahoo!  Fantasy Sports.  This is what I did.

Purpose:  Leveraging ChatGPT, Google Gemini Advanced, Meta AI and MSN Copilot chatbots, I asked them to give the top 50 quarterbacks for this season.   

Scope:  I focused on NFL quarterbacks because only 32 quarterbacks are playing any given week.  Unlike kickers, quarterbacks have a significant impact on a football game.  

Language Learning Models Used:

AI Vendor LLM
OpenAI Chat GPT GPT-4o
Google Gemini Gemini 1.5 Pro
Meta AI Llama 3.
MSN CoPilot Unknown

The prompt I used for the four sites was:
"Give me the list of the top 50 quarterbacks for my fantasy American football league.  Here is the scoring key: 
Offense League                              Value (points)      
Passing Yards:                                  25 yards per point; 5 points at 350 yards 
Passing Touchdowns:                      
Interceptions:                                  -1 
Rushing Yards:                                 10 yards per point; 5 points at 200 yards 
Rushing Touchdowns:                      6 
Receptions:                                       1
Receiving Yards:                              10 yards per point; 5 points at 200 yards 
Receiving Touchdowns:                  
Return Touchdowns:                        6 
2-Point Conversions:                      
Fumbles Lost:                                 -2 
Offensive Fumble Return TD:         6"

I compared the LLM results with Yahoo!  Fantasy Sports' top 50 quarterbacks (QB).  As of August 16th, 2024, here is the list:
  1. Josh Allen - Buffalo Bills
  2. Jalen Hurts - Philadelphia Eagles
  3. Lamar Jackson - Baltimore Ravens
  4. Patrick Mahomes - Kansas City Chiefs
  5. Anthony Richardson - Indianapolis Colts
  6. C.J. Stroud - Houston Texans
  7. Kyle Murray - Arizona Cardinals
  8. Joe Burrow - Cincinnati Bengals
  9. Dak Prescott - Dallas Cowboys
  10. Jordan Love - Greenbay Packers
  11. Jayden Daniels - Washington Commanders (rookie)
NOTE: The rest of the list is here: Top 50 QBs - 8/16/2024 - Google Sheets

Here is how the various LLM models ranked these quarterbacks.
Yahoo!  Rank Name ChatGPT Meta AI MSN Copilot Google Gemini
1 Josh Allen 1 2 2 1
2 Jalen Hurts 2 4 4 2
3 Lamar Jackson 4 3 3 4
4 Patrick Mahomes 3 1 1 3
5 Trent Richardson 5 5 5 19
6 C.J. Stroud 6 6 6 28
7 Kyle Murray 8 8 8 10
8 Joe Burrow 7 7 7 6
9 Dak Prescott 9 10 10 14
10 Jordan Love 11 9 9 21
11 Jayden Daniels 17 11 11 NOT LISTED

It appears Google Gemini's results (https://gemini.google.com/app/1ab3b15202d9e4f1) did not incorporate the 2024 NFL Draft or the changes in free agency.

Unlike Google Gemini,  ChatGPT, Meta AI, and MSN, Copilot aggregated ranking from other sites.  These sources are listed in the various result sets.  Here are the links to the prompts and the appropriate result sets:
The three LLMs didn't do number crunching to generate the results but instead pulled data from other websites and aggregated the data.

Since I was doing a rudimentary analysis of the various LLMs to meet my needs with my upcoming Fantasy Football draft, I didn't write any code but simply leveraged existing algorithms in Microsoft Excel.   

Results:
Here is the summary of the results:

No. Player Teams Depth Yahoo!  Sports AI Rank
1 Josh Allen Buf 1 1 2
2 Jalen Hurts Phi 1 2 4
3 Lamar Jackson Bal 1 3 3
4 Patrick Mahomes KC 1 4 1
5 Anthony Richardson Ind 1 5 5
6 C.J. Stroud Hou 1 6 6
7 Kyle Murray Ari 1 7 8
8 Joe Burrow Cin 1 8 7
9 Dak Prescott Dal 1 9 10
10 Jordan Love GB 1 10 9
11 Jayden Daniels Was 1 11 11
12 Brock Purdy SF 1 12 12
13 Jared Goff Det 1 13 15
14 Tua Tagovailoa Mia 1 14 13
15 Caleb Williams Chi 1 15 16
16 Trevor Lawrence Jax 1 16 14
17 Kirk Cousins Atl 1 17 17
18 Matthew Stafford LAR 1 18 19
19 Justin Herbert LAC 1 19 18
20 Geno Smith Sea 1 20 22
21 Aaron Rodgers NYJ 1 21 23
22 Deshaun Watson Cle 1 22 20
23 Baker Mayfield TB 1 23 21
24 Will Levis Ten 1 24 24
25 Derek Carr NO 1 25 26
26 Taysom Hill NO 26
27 Daniel Jones NYG 1 27 25
28 Bryce Young Car 1 28 28
29 Bo Nix Den 1 29 30
30 Justin Fields Pit 2 30 27
31 Russell Wilson Pit 1 31 29
32 JJ McCarthy Min 1 32 35.5
33 Jacoby Brissett NE 33 33
34 Sam Darnold Min 34 32
35 Gardner Minshew II LV 1 35 34
36 Drake Maye NE 1 36 31

Regarding the AI Rank, I removed the Google Gemini results using the Standard Deviation algorithm STDEV and the Variance algorithm VAR  since they were outliers compared to the other results.  Here is the link to the spreadsheet (ai-fantasyqb.xlsx - Google Sheets)

Conclusion:
I didn't expect the diversity of the results from the four AI engines and how these AI engines leverage third-party content from other respectable websites.  The other thing I didn't expect is that the results generated from their APIs are quite different since they don't pull data from third-party websites.  In summary, I envision using something other than AI and the various respective LLMs to address my fantasy sports needs.  I expect sports vendors like ESPN, Yahoo!, and sporting organizations to probably have sophisticated AI, LLMs, and machine learning technologies, which will be more reliable for at least the next couple of years.

On a personal note,  if you are an aspiring data scientist, don't be discouraged because you don't have a programming background or understand various algorithms like backward propagation neural networks.  You should be curious to learn from the data.

Friday, August 9, 2024

Large Language Models Asserting Themselves


Wow!  My last post on this blog was on July 26, 2020.  It was four years and 14 days ago, prior to President Joe Biden being elected, hoards of disgruntled individuals storming the US Capitol, and ChatGPT asserting itself on the technology world.  

What is ChatGPT?  It's OpenAI's chatbot.  It is based on artificial intelligence, specifically Large Language Models (LLM).  If you ask ChatGPT, "Can you write me a blog entry about ChatGPT and "LM?" it gives me the following answer: "Certainly! Here’s a blog entry about ChatGPT and Large Language Models (LLMs).” Assuming you clicked on the link,  you can see it did my work for you.  Initially, it was great, but the more I played around with it, I realized that, like any other technology,  it has its benefits and drawbacks.

It's great for prototyping and brainstorming.  I, however, would not put merit into developing new products with the appropriate quality measures.  For instance, I created outlines for strategy documents, but the selection and sequencing of the document sections were based on what I was trying to communicate to the audience.  Recently, I began developing a board game for kids, and I wanted to create a quick prototype on the web using ReactJS.  The prototype is to refine the gameplay, and I didn't spend a lot of time architecting the ReactJS application.  Still, I realized that it wasn't easy to customize ChatGPT-generated code. I didn't have a good grasp of the ReactJS fundamentals.  I, therefore, decided to take a brief pause and focus on the fundamentals, including the syntax.  A few days ago, I got selected to be an AI tester for Outlier.ai.  It's fascinating how the backend works, and products like ChatGPT, Google Gemini, and others need refinement.  Since I was introduced to cybersecurity at work around the time of my previous blog post, I am interested in seeing how the cybersecurity landscape will evolve LLMs.   Black hat hackers and other "baddies" are the creative ones since they are constantly exploiting technologies to make a profit.  I am specifically interested in how data, cyber, and IT systems intersect without impacting user experience and optimizing business.