top of page
Writer's pictureGreg Hung

Learn Generative AI – Video, Music, Images & Hand’s on Review

Learn Generative AI – Video, Music, Images & Hand’s on Review

Open AI’s ChatGPT is getting all the attention in the AI world, but Generative AI is pretty cool as well. Generative video AI is a cutting-edge field that utilizes deep learning techniques to generate unique, high-quality videos using just a script. It has a wide range of applications from training videos, youtube videos, promotional sales videos, and it can even speak in multiple languages fluently.

Generative AI refers to generating content (music, video, text, image) from a text prompt. This is going to become a core skill being able to leverage the text prompt to create various forms of content. This is a hybrid article where I’ve used a bit of help from Bing CHATgpt to write this article starting combined with old fashion human blogging power.

Generative AI is a branch of artificial intelligence that focuses on creating new content or data from scratch, such as images, text, music, or speech. Generative AI has made remarkable progress in recent years, thanks to advances in deep learning, natural language processing, computer vision, and generative adversarial networks (GANs).

Some of the applications of generative AI include:

– Content creation: Generative AI can help artists, writers, musicians, and designers produce original and diverse content, such as paintings, poems, songs, logos, and more. For example, OpenAI’s DALL-E can generate realistic images from text descriptions, such as “a cat wearing a suit and tie”.

– Data augmentation: Generative AI can help researchers and practitioners augment their existing data sets with synthetic data, which can improve the performance and robustness of machine learning models. For example, NVIDIA’s StyleGAN can generate high-quality faces of people who do not exist, which can be used for face recognition or verification tasks.

– Data synthesis: Generative AI can help generate data that is otherwise difficult or expensive to obtain, such as medical images, weather forecasts, or financial scenarios. For example, Microsoft’s Turing-NLG can generate coherent and fluent text on any topic, which can be used for summarization, translation, or question answering tasks.

Generative AI is still an active and evolving field of research, with many challenges and opportunities ahead. Some of the challenges include:

– Ethical and social implications: Generative AI can pose risks to privacy, security, authenticity, and fairness. For example, generative AI can be used to create deepfakes, which are manipulated videos or images that appear real but are not. Deepfakes can be used for malicious purposes, such as spreading misinformation, impersonating someone, or blackmailing someone.

– Evaluation and quality control: Generative AI can be difficult to evaluate and measure objectively, as different users may have different preferences and expectations for the generated content or data. For example, generative AI can produce outputs that are novel but not relevant, or relevant but not novel. Moreover, generative AI can also produce outputs that are erroneous or harmful, such as offensive language or biased stereotypes.- Explainability and transparency: Generative AI can be complex and opaque, making it hard to understand how and why it generates certain outputs. For example, generative AI can have hidden assumptions or biases that are not explicitly stated or controlled by the user. Moreover, generative AI can also have unintended consequences or side effects that are not anticipated or desired by the user.

Generative AI is a fascinating and promising field of artificial intelligence that has the potential to transform various domains and industries. However, generative AI also requires careful and responsible development and use, with respect to ethical principles and human values.

ok and now back to me to continue breaking down my experience on various forms of generative AI

Video AI

You don’t need to be a programmer to start using video AI. You simply need a script and a paid subscription to synthesia.io. You can start a free trial if you want. ChatGPT doesn’t produce generative AI video yet, but it does do generate AI images so it is easy to imagine Generative Video with CHATGPT Technology. The first time I saw the result of Synthesia I was blown away at how real the avatar was. The paid plan was $30 for 10 credits. 10 credits works out to about 10 minutes worth of video.

I’ve been testing some videos on my Youtube Channel to start getting more familiar with the platform. I did a financial market update video. The first video took some time as I had to modify an existing template; however the interface is easy to use. If you can create a powerpoint slide you can create an AI video. In fact you can even import a powerpoint as a template. When you are happy with your video you can generate a video in a MP4 Format.


Learn AI Video

Although Synthesia doesn’t translate you can provide it a script in over 120 languages and the avatar will deliver it in a fluent accent. You even have several choices per language and a choice of accents!

This type of technology is a game changer for smaller companies looking to scale their business that don’t have the benefit of a dedicated media team. It can be expensive and time consuming to produce professional videos and now the technology is here.  As a Youtube and online course creator this technology can help me reach new audiences through different languages as well as scale up my Market Update Videos where I don’t need to be the one walking. It free’s you up from having to find a studio, find lighting, and remember and deliver your lines. You don’t actually need a camera anymore although having a video editing program will be useful for making your own alternations to the video.

Although I can see this technology been used it won’t necessarily replace the need for doing all videos yourself. I think there will be an increasing high demand for human based videos especially when it comes to having that personal or emotional connection. Also certain types of videos like comedy videos will be hard to replicate with AI technology for now.

Update: Synthesia.io has recently changed their policy on producing content on current events like cryptocurrency or stocks using their avatars, which I was using for my youtube videos. Instead they are encouraging you to create your own avatar in a studio custom avatar or a lower quality webcam format.

The  STUDIO custom avatar is a paid add on of $1000 per avatar per year, and if you are on a personal plan, you will need to upgrade to our annual personal plan, which is $270 per year.

Alternatively, the webcam avatar is made entirely with your webcam, and come free with our annual personal plan ($270). They tend to be lower quality as they are made with webcam, but with the right lighting and set up, you can create some great avatars! I have since cancelled my subscription due to this change in policy and the ongoing annual fee. I believe there should be more text to video AI platforms coming out in the market shortly.

Update Nov 2023: Video AI platforms have advanced and we now have the early stages of text prompt to video using https://ai.invideo.io/. These platforms will generate the script, voiceover, and use relevant stock footage to help tell your story. The best part is that there is a lot you can customize and re-generative the video for your use. Watch this video for the latest.

I’ve recent subscribed to https://app.vidyo.ai/, which allows me to upload a video and have AI help me pick out the best parts and provide templates and captions so they are short form ready for instagram or tiktok.

Update Feb 2024: Open AI has released Sora text to video. This is perhaps the most potential to disrupt multiple industries based on what they have shown on their website. The generated videos are limited to 1 minuted, but the quality of the videos are astonishing. As a stock footage video creator I can see the entire industry is within the crosshairs of this technology. If you look at the examples you will also see pixar quality animation on the site that will threaten animators jobs. Will the revenue then shift from the creators to the hands of Open AI? check out my video for a more in depth analysis.

Image to Text AI

This is the most widely covered form of AI because visual AI is amazing. Using a text prompt you can almost create anything you can think of in theory. In reality you need to get some experience getting familiar and specific with the text prompt of the platform. The current platforms include Midjourney, DALL-E 2, Microsoft Bing powered by DALL-E, and now adobe has Adobe photoshop Beta with generative Fill and Adobe firefly. Each platform has their own features, but all rely on a text prompt to generate the image. You can create 2-4 samples and you can create variations using that sample and eventually share or download your image. One of the trick’s is to check out existing images you like and modify the text prompt using an art style you like. An example of my text prompt for this Tesla was using Bing Image Creator:

“tesla on a pink background”

generative AI text to image tesla

pink tesla bing image creator ai


“tesla on a pink background pixel art”


Tesla AI Image

Midjourney – This has the most underground feel leveraging the platform discord where you have to pay a monthly fee to play. This is the most liberal platform allowing you to create images using celebrities on popular people such as the pope wearing the puffy jacket. I created an image using Justin Trudeau using this platform

Microsoft Bing powered by DALL-E – Microsoft made the 10 billion dollar investment into open ai and got to use their technology in their BING search and image creator. This is an easy to use program with some guard rails, but it’s a great way to transition from bing chatgpt to text to image generation.

Adobe Firefly – This is the newest platform and is changing by the week. The text to image is similar to the rest except they are watermarking the images and inserting a digital ID to track images it creates. Check out our youtube video for tips on how to use it.

Adobe Photoshop Beta – Adobe has integrated to the text to image AI capability into Photoshop. You can create or remove objects after selecting an area. You can also extend your image canvas and the program is smart to enough to generate the remainder. In my opinion given Adobe’s existing ecosystem this is just the beginning for creators to leverage AI to enhance their creativity. Check out the video for more coverage on the tool.

Creating AI Music

This became popular after AI drake emerged with some songs like “Heart on my Sleeve” and “Cold Outside”. You can take an existing music track or youtube video and have it converted to popular Music AI cover models like Drake, Kanye West, Joe Biden, or even Donald Trump and more using voicify.ai.

2 views0 comments

Comments


bottom of page